Dissertations / Theses: 'Analysis of item difficulty'

1

ISHII, Hidetoki, Kazuhiro YASUNAGA, 秀宗石井, and 和央安永. "国語読解テストにおける設問文中の単語の難しさが能力評価に及ぼす影響 : 具体例を回答させる設問の検討." 名古屋大学大学院教育発達科学研究科, 2012. http://hdl.handle.net/2237/16163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Kim-O, Mee-Ae (Mia). "Analysis of item difficulty and change in mathematical achievement from 6th to 8th grade's longitudinal data." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41194.

Full text

Abstract:

Mathematics is an increasingly important aspect of education because of its central role in technology. Learning mathematics at the elementary and middle school levels forms the basis for achievement in high school and college mathematics, and for the broad range of mathematical skills used in the workplace. Especially, the middle school years (e.g., Grade 6-Grade8) are crucial to success in mathematics because students must acquire the skills needed in higher levels of mathematics and complex reasoning ability based on the developmental perspectives on cognition (e.g., Piaget, Vygotsky). The purpose of the current study was to measure and interpret the mathematical achievement growth during the middle school years using some very recent advances (confirmatory multidimensional and longitudinal models) in item response theory. It was found that the relative strength of the content areas (mathematical standards and benchmarks) shifted somewhat across grades in defining mathematical achievement. The largest growth occurred from Grade 6 to Grade 7. The specific pattern of growth varied substantially by the socio-economic status of the student but few differences emerged by gender. The implications of the results for education and for developmental theories of cognitive complexity are discussed.

APA, Harvard, Vancouver, ISO, and other styles

3

Nishitani, Atsuko. "A Hierarchy of Grammatical Difficulty for Japanese EFL Learners: Multiple-Choice Items and Processability Theory." Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/176422.

Full text

Abstract:

CITE/Language Arts
Ed.D.
This study investigated the difficulty order of 38 grammar structures obtained from an analysis of multiple-choice items using a Rasch analysis. The order was compared with the order predicted by processability theory and the order in which the structures appear in junior and senior high school textbooks in Japan. Because processability theory is based on natural speech data, a sentence repetition test was also conducted in order to compare the result with the order obtained from the multiple-choice tests and the order predicted by processability theory. The participants were 872 Japanese university students, whose TOEIC scores ranged from 200 to 875. The difficulty order of the 38 structures was displayed according to their Rasch difficulty estimates: The most difficult structure was subjunctive and the easiest one was present perfect with since in the sentence. The order was not in accord with the order predicted by processability theory, and the difficulty order derived from the sentence repetition test was not accounted for by processability theory either. In other words, the results suggest that processability theory only accounts for natural speech data, and not elicited data. Although the order derived from the repetition test differed from the order derived from the written tests, they correlated strongly when the repetition test used ungrammatical sentences. This study tentatively concluded that the students could have used their implicit knowledge when answering the written tests, but it is also possible that students used their explicit knowledge when correcting ungrammatical sentences in the repetition test. The difficulty order of grammatical structures derived from this study was not in accord with the order in which the structures appear in junior and senior high school textbooks in Japan. Their correlation was extremely low, which suggests that there is no empirical basis for textbook makers'/writers' policy regarding the ordering of grammar items. This study also demonstrated the difficulty of writing items testing the knowledge of the same grammar point that show similar Rasch difficulty estimates. Even though the vocabulary and the sentence positions were carefully controlled and the two items looked parallel to teachers, they often displayed very different difficulty estimates. A questionnaire was administered concerning such items, and the students' responses suggested that they seemed to look at the items differently than teachers and what they notice and how they interpret what they notice strongly influences item difficulty. Teachers or test-writers should be aware that it is difficult to write items that produce similar difficulty estimates and their own intuition or experience might not be the best guide for writing effective grammar test items. It is recommended to pilot test items to get statistical information about item functioning and qualitative data from students using a think-aloud protocol, interviews, or a questionnaire.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

4

Lin, Peng. "IRT vs. factor analysis approaches in analyzing multigroup multidimensional binary data the effect of structural orthogonality, and the equivalence in test structure, item difficulty, & examinee groups /." College Park, Md.: University of Maryland, 2008. http://hdl.handle.net/1903/8468.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2008.
Thesis research directed by: Dept. of Measurement, Statistics and Evaluation. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

5

Hashimoto, Brett James. "Rethinking Vocabulary Size Tests: Frequency Versus Item Difficulty." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5958.

Full text

Abstract:

For decades, vocabulary size tests have been built upon the idea that if a test-taker knows enough words at a given level of frequency based on a list from corpus, they will also know other words of that approximate frequency as well as all words that are more frequent. However, many vocabulary size tests are based on corpora that are as out-of-date as 70 years old and that may be ill-suited for these tests. Based on these potentially problematic areas, the following research questions were asked. First, to what degree would a vocabulary size test based on a large, contemporary corpus be reliable and valid? Second, would it be more reliable and valid than previously designed vocabulary size tests? Third, do words across, 1,000-word frequency bands vary in their item difficulty? In order to answer these research questions, 403 ESL learners took the Vocabulary of American English Size Test (VAST). This test was based on a words list generated from the Corpus of Contemporary American English (COCA). This thesis shows that COCA word list might be better suited for measuring vocabulary size than lists used in previous vocabulary size assessments. As a 450-million-word corpus, it far surpasses any corpus used in previously designed vocabulary size tests in terms of size, balance, and representativeness. The vocabulary size test built from the COCA list was both highly valid and highly reliable according to a Rasch-based analysis. Rasch person reliability and separation was calculated to be 0.96 and 4.62, respectively. However, the most significant finding of this thesis is that frequency ranking in a word list is actually not as good of a predictor of item difficulty in a vocabulary size assessment as perhaps researchers had previously assumed. A Pearson correlation between frequency ranking in the COCA list and item difficulty for 501 items taken from the first 5,000 most frequent words was 0.474 (r^2 = 0.225) meaning that frequency rank only accounted for 22.5% of the variability of item difficulty. The correlation decreased greatly when item difficulty was correlated against bands of 1,000 words to a weak r = 0.306, (r^2 = 0.094) meaning that 1,000-word bands of frequency only accounts for 9.4% of the variance. Because frequency is a not a highly accurate predictor of item difficulty, it is important to reconsider how vocabulary size tests are designed.

APA, Harvard, Vancouver, ISO, and other styles

6

Li, Yong "Isaac." "Extending the Model with Internal Restrictions on Item Difficulty (MIRID) to Study Differential Item Functioning." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6724.

Full text

Abstract:

Differential item functioning (DIF) is a psychometric issue routinely considered in educational and psychological assessment. However, it has not been studied in the context of a recently developed componential statistical model, the model with internal restrictions on item difficulty (MIRID; Butter, De Boeck, & Verhelst, 1998). Because the MIRID requires test questions measuring either single or multiple cognitive processes, it creates a complex environment for which traditional DIF methods may be inappropriate. This dissertation sought to extend the MIRID framework to detect DIF at the item-group level and the individual-item level. Such a model-based approach can increase the interpretability of DIF statistics by focusing on item characteristics as potential sources of DIF. In particular, group-level DIF may reveal comparative group strengths in certain secondary constructs. A simulation study was conducted to examine under different conditions parameter recovery, Type I error rates, and power of the proposed approach. Factors manipulated included sample size, magnitude of DIF, distributional characteristics of the groups, and the MIRID DIF models corresponding to discrete sources of differential functioning. The impact of studying DIF using wrong models was investigated. The results from the recovery study of the MIRID DIF model indicate that the four delta (i.e., non-zero value DIF) parameters were underestimated whereas item locations of the four associated items were overestimated. Bias and RMSE were significantly greater when delta was larger; larger sample size reduced RMSE substantially while the effects from the impact factor were neither strong nor consistent. Hypothesiswise and adjusted experimentwise Type I error rates were controlled in smaller delta conditions but not in larger delta conditions as estimates of zero-value DIF parameters were significantly different from zero. Detection power of the DIF model was weak. Estimates of the delta parameters of the three group-level DIF models, the MIRID differential functioning in components (DFFc), the MIRID differential functioning in item families (DFFm), and the MIRID differential functioning in component weights (DFW), were acceptable in general. They had good hypothesiswise and adjusted experimentwise Type I error control across all conditions and overall achieved excellent detection power. When fitting the proposed models to mismatched data, the false detection rates were mostly beyond the Bradley criterion because the zero-value DIF parameters in the mismatched model were not estimated adequately, especially in larger delta conditions. Recovery of item locations and component weights was also not adequate in larger delta conditions. Estimation of these parameters was more or less affected adversely by the DIF effect simulated in the mismatched data. To study DIF in MIRID data using the model-based approach, therefore, more research is necessary to determine the appropriate procedure or model to implement, especially for item-level differential functioning.

APA, Harvard, Vancouver, ISO, and other styles

7

Young, Candice Marie. "The Influence of Person and Item Characteristics on the Detection of Item Insensitivity." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1302138491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Troyka, Rebecca J. "An investigation of item difficulty in the Stanford-Binet intelligence scale, fourth edition." Virtual Press, 1989. http://liblink.bsu.edu/uhtbin/catkey/560300.

Full text

Abstract:

Introduced in 1986, the Stanford-Binet Intelligence Scale: Fourth Edition differs radically from its predecessors. Because of the adaptive testing format and the limited number of items given to each subject, it is especially important that consecutive levels in each of the tests increase in difficulty. The purpose of this study was to investigate the progression of difficulty among items in the Fourth Edition.Three hundred sixty-four subjects f iii Indiana who ranged in age from 3 years, 0 months to 23 years, 4 months were administered the Fourth Edition. The study was limited to those subjects earning a Composite SAS Score at or above 68.Data were presented to indicate trends in the difficulty of each item as well as in the difficulty of each level in the Fourth Edition. Three research questions were answered. 1.) Are the items at each level equally difficult? 2.) Are the levels in each test arranged so that the level with the least difficult items is first followed by levels with more and more difficult items? 3.) In each test is an item easier for subjects who have entered at a higher level than it is for subjects who have entered at a lower level?The results supported the hypotheses, confirming that the Fourth Edition is a solidly constructed test in terms of item difficulty levels. Most item pairs within a level were found to be approximately equal in difficulty. Nearly all of the levels in each test were followed by increasingly more difficult levels. And each item was found to be more difficult for subjects entering at a lower entry level than for those entering at a higher entry level with very few exceptions. For these few discrepancies found, there was no reason to believe that these were caused by anything other than chance.
Department of Educational Psychology

APA, Harvard, Vancouver, ISO, and other styles

9

野口, 裕之, and Hiroyuki NOGUCHI. "<原著>項目困難度の分布の偏りが IRT 項目パラメタの発見的推定値に与える影響." 名古屋大学教育学部, 1992. http://hdl.handle.net/2237/3870.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Curtin, Joseph A. "Testing the Assumption of Sample Invariance of Item Difficulty Parameters in the Rasch Rating Scale Model." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2081.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Schleicher-Dilks, Sara Ann. "Exploring the Item Difficulty and Other Psychometric Properties of the Core Perceptual, Verbal, and Working Memory Subtests of the WAIS-IV Using Item Response Theory." Diss., NSUWorks, 2015. https://nsuworks.nova.edu/cps_stuetd/87.

Full text

Abstract:

The ceiling and basal rules of the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, 2008) only function as intended if subtest items proceed in order of difficulty. While many aspects of the WAIS-IV have been researched, there is no literature about subtest item difficulty and precise item difficulty values are not available. The WAIS-IV was developed within the framework of Classical Test Theory (CTT) and item difficulty was most often determined using p-values. One limitation of this method is that item difficulty values are sample dependent. Both standard error of measurement, an important indicator of reliability, and p-values change when the sample changes. A different framework within which psychological tests can be created, analyzed and refined is called Item Response Theory (IRT). IRT places items and person ability onto the same scale using linear transformations and links item difficulty level to person ability. As a result, IRT is said to be produce sample-independent statistics. Rasch modeling, a form of IRT, is one parameter logistic model that is appropriate for items with only two response options and assumes that the only factors affecting test performance are characteristics of items, such as their difficulty level or their relationship to the construct being measured by the test, and characteristics of participants, such as their ability levels. The partial credit model is similar to the standard dichotomous Rasch model, except that it is appropriate for items with more than two response options. Proponents of standard dichotomous Rasch model argue that it has distinct advantages above both CTT-based methods as well as other IRT models (Bond & Fox, 2007; Embretson & Reise, 2000; Furr & Bacharach, 2013; Hambleton & Jones, 1993) because of the principle of monotonicity, also referred to as specific objectivity, the principle of additivity or double cancellation, which “establishes that two parameters are additively related to a third variable” (Embretson & Reise, 2000, p. 148). In other words, because of the principle of monotonicity, in Rasch modeling, probability of correctly answering an item is the additive function of individuals’ ability, or trait level, and the item’s degree of difficulty. As ability increases, so does an individual’s probability of answering that item. Because only item difficulty and person ability affect an individual’s chance of correctly answering an item, inter-individual comparisons can be made even if individuals did not receive identical items or items of the same difficulty level. This is why Rasch modeling is referred to as a test-free measurement. The purpose of this study was to apply a standard dichotomous Rasch model or partial credit model to the individual items of seven core perceptual, verbal and working memory subtests of the WAIS-IV: Block Design, Matrix Reasoning, Visual Puzzles, Similarities, Vocabulary, Information, Arithmetic Digits Forward, Digits Backward and Digit Sequencing. Results revealed that WAIS-IV subtests fall into one of three categories: optimally ordered, near optimally ordered and sub-optimally ordered. Optimally ordered subtests, Digits Forward and Digits Backward, had no disordered items. Near optimally ordered subtests were those with one to three disordered items and included Digit Sequencing, Arithmetic, Similarities and Block Design. Sub-optimally ordered subtests consisted of Matrix Reasoning, Visual Puzzles, Information and Vocabulary, with the number of disordered items ranging from six to 16. Two major implications of the result of this study were considered: the impact on individuals’ scores and the impact on overall test administration time. While the number of disordered items ranged from 0 to 16, the overall impact on raw scores was deemed minimal. Because of where the disordered items occur in the subtest, most individuals are administered all the items that they would be expected to answer correctly. A one-point reduction in any one subtest is unlikely to significantly affect overall index scores, which are the scores most commonly interpreted in the WAIS-IV. However, if an individual received a one-point reduction across all subtests, this may have a more noticeable impact on index scores. In cases where individuals discontinue before having a chance to answer items that were easier, clinicians may consider testing the limits. While this would have no impact on raw scores, it may provide clinicians with a better understanding of individuals’ true abilities. Based on the findings of this study, clinicians may consider administering only certain items in order to test the limits, based on the items’ difficulty value. This study found that the start point for most subtests is too easy for most individuals. For some subtests, most individuals may be administered more than 10 items that are too easy for them. Other than increasing overall administration time, it is not clear what impact, of any, this has. However, it does suggest the need to reevaluate current start items so that they are the true basal for most people. Future studies should break standard test administration by ignoring basal and ceiling rules to collect data on more items. In order to help clarify why some items are more or less difficult than would be expected given their ordinal rank, future studies should include a qualitative aspect, where, after each subtest, individuals are asked describe what they found easy and difficult about each item. Finally, future research should examine the effects of item ordering on participant performance. While this study revealed that only minimal reductions in index scores likely result from the prematurely stopping test administration, it is not known if disordering has other impacts on performance, perhaps by increasing or decreasing an individual’s confidence.

APA, Harvard, Vancouver, ISO, and other styles

12

Luger, Sarah Kaitlin Kelly. "Algorithms for assessing the quality and difficulty of multiple choice exam questions." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20986.

Full text

Abstract:

Multiple Choice Questions (MCQs) have long been the backbone of standardized testing in academia and industry. Correspondingly, there is a constant need for the authors of MCQs to write and refine new questions for new versions of standardized tests as well as to support measuring performance in the emerging massive open online courses, (MOOCs). Research that explores what makes a question difficult, or what questions distinguish higher-performing students from lower-performing students can aid in the creation of the next generation of teaching and evaluation tools. In the automated MCQ answering component of this thesis, algorithms query for definitions of scientific terms, process the returned web results, and compare the returned definitions to the original definition in the MCQ. This automated method for answering questions is then augmented with a model, based on human performance data from crowdsourced question sets, for analysis of question difficulty as well as the discrimination power of the non-answer alternatives. The crowdsourced question sets come from PeerWise, an open source online college-level question authoring and answering environment. The goal of this research is to create an automated method to both answer and assesses the difficulty of multiple choice inverse definition questions in the domain of introductory biology. The results of this work suggest that human-authored question banks provide useful data for building gold standard human performance models. The methodology for building these performance models has value in other domains that test the difficulty of questions and the quality of the exam takers.

APA, Harvard, Vancouver, ISO, and other styles

13

Korir, Daniel K. "The effects of item difficulty and examinee ability on the distribution and effectiveness of LZ and ECIZ4 appropriateness indices." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/10488.

Full text

Abstract:

Test scores are intended to provide a measure of examinee's estimate of ability. High ability examinees are expected to get few easy items wrong and low ability examinees are exepcted to get few difficult items right. But there are occasions when the test-taking behavior of some atypical examinees may be so unsual that their test scores cannot be regarded as an appropriate measure of ability. An atypical examinee can have a spuriously low or a spuriously high score. However, appropriateness indices can be used to identify examinees with potentially inaccurate total scores. Appropriateness indices provide quantitative, measures of response pattern atypicality. These indices fall into two major categories: (a) IRT-based and (b) non-IRT based indices. The dependency of non-IRT based indices on the item difficulty order of a particular group has rendered them inadequate for detecting aberrant reponse patterns. IRT-based indices are group invariant. Researchers have investigated the effectiveness and the distributions of these indices under varying conditions of testing. However, some test situations might require efficient and accurate indices of appropriateness measurement for restricted samples. It might be helpful, for example, to accurately identify examinees with potential spuriously low scores falling just the below the criterion of a minimum competency test, on a certification test, it might be helpful to concentrate on identifying examinees with spuriously high scores. Therefore, the effects of item difficulty 7 and examinee ability distributions on the effectiveness and the distributional characeristics of LZ and ECIZ4 (IRT-based) appropriateness indices were investigated in this study. To examine the effects of item difficulty and ability distributions on the distributional characteristics of LZ and ECIZ4, data were generated in nine combinations of item difficulty and ability distributions to simulate the responses of 2000 examinees to 60 test items according to the three-parameter model. Three uniform distributions of item difficulty were used. Items typical of diagnostic tests were generated in the interval -3.0 to +1.2; items typical of power tests were generated in the interval of -3.0 to +3.0; and items typical of certification and licencing tests were generated in the interval of -1.2 to +3.0. Three distributions of ability were used. Thetas typical of low, medium, and high ability examinees were generated to have normal distributions with the means of -1.2, 0.0, and +1.2 respectively and each with a standard deviation of 0.6. The mean, standard deviation, skewness, kurtosis, and the percentile estimates of LZ and ECIZ4 were significantly affected by the variations of item difficulty and ability distributions. The distributions of the two indices approximated a normal distribution when the ability estimates matched the item difficulty. Overall, the distributions of LZ approximated a normal distribution better than the distribution of ECIZ4. To examine the effectiveness of LZ and ECIZ4 in detecting aberrant response patterns, two samples, each consisting of 500 response patterns (for spuriously low and spuriously high) were generated for each of the nine combinations of item difficulty and ability distribution and subjected to spurious treatments. Twenty percent and 10% spuriously high scores were created by randomly selecting 20% or 10% of the original responses and changing incorrect answers to correct. Twenty percent and 10% spuriously low scores were created by randomly selecting 20% or 10% of the original responses and changing correct answers to incorrect. The percentile estimates obtained were used as cutoff points to classify response patterns as aberrant or non-aberrant. Spuriously low aberrant response patterns were easier to detect by the two indices under the low item difficulty and spuriously high aberrant response patterns were easier to detect under high item difficulty. At low (0.01 and 0.05) false positive rates, LZ had higher detection rates of spuriously high and spuriously low aberrant response patterns than ECIZ4 under the high item difficulty; and ECIZ4 had higher detection rates than LZ under the medium and under the low item difficulty. Twenty percent treatment samples were easier to detect by the two indices than the 10% treatment samples.

APA, Harvard, Vancouver, ISO, and other styles

14

Cash, Charles R. "Stochastic Analysis of Multi-Item Flow Lines /." The Ohio State University, 1996. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487931993468164.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Cheuvront, Melinda Lee. "Analysis of sensitivity and comprehension difficulty among expository text structures." Thesis, Boston University, 2002. https://hdl.handle.net/2144/33425.

Full text

Abstract:

Thesis (Ed.D.)--Boston University
PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
The purpose of this study was to examine fourth-grade students' sensitivity to and comprehension of expository text structures. It was hypothesized that a hierarchy of student sensitivity and comprehension difficulty among expository text structures exists. A total of 83 fourth-grade students from two Boston private schools read a passage, performed an immediate written recall, and answered eight comprehension questions. Two days later, students again performed a written recall. This procedure was repeated for four weeks, each time in response to a passage addressing a similar topic with a different text structure (comparecontrast- CC, cause-effect- CE, description- D, or problem-solution- PS). Microstructures, such as reading levels and sentence complexity, were controlled across all passages. A counterbalanced design of both text structure and topic controlled for order effects. Four important findings related to students' sensitivity to and comprehension of expository text structures were found. First, on immediate recalls the students were significantly less sensitive to the 0 text structure than they were to both the CC and PS structures. On delayed recalls, students were less sensitive to the 0 structure than the PS structure. Second, students scored significantly lower on the comprehension task for the 0 text structure than the CC and PS structures. Third, high ability readers outperformed average and low readers on all text structures except for 0, in which they performed significantly better than only the low group. Fourth, students who were more sensitive to text structure recalled significantly more superordinate ideas. In fact, sensitivity to text structure explained nearly 70% of the variability in the percentage of superordinate ideas recalled. In summary, data support the conclusion that a hierarchy of difficulty among text structures does indeed exist. In addition, students who were more sensitive to text structure also recalled significantly more superordinate ideas. These findings presently culminate a quarter century of research illustrating the influence of text structure on student comprehension and could impact how curricula and textbooks emphasize and sequence the instruction and assessment of expository text structures.
2031-01-01

APA, Harvard, Vancouver, ISO, and other styles

16

Siller, Klaus [Verfasser]. "Predicting Item Difficulty in a Reading Test : A Construct Identification Study of the Austrian 2009 Baseline English Reading Test / Klaus Siller." Frankfurt a.M. : Peter Lang GmbH, Internationaler Verlag der Wissenschaften, 2020. http://d-nb.info/1209451336/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Jensen, Jennifer Lynn. "An Item Reduction Analysis of the Group Questionnaire." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5988.

Full text

Abstract:

The Group Questionnaire (GQ) was developed to measure group therapeutic processes-which are linked to successful prediction of patient outcome and therapeutic factors-across three qualitative dimensions (positive bond, positive work, and negative relationship) and three structural dimensions (member-leader, member-member, and member-group). The GQ model has been shown to be valid across 5 settings and 4 countries. As a clinical measure given after each session, length is of particular concern. Although shorter measures are more convenient for clients and therapists to use, fewer items necessarily means less information, a loss of psychometrics, and possible floor and ceiling effects. This study examined the effects of shortening the GQ on its clinical utility and psychometric integrity. Methods. Archival data from 7 previous studies was used, with 2,594 participants in an estimated 455 groups gathered from counseling centers, non-clinical process groups, inpatient psychiatric hospitals, outpatient psychiatric hospitals, and an inpatient state hospital. Participants answered questions from the Group Questionnaire administered during the productive working phase of a group. Analysis. Analysis was done using multilevel structural equation modeling in Mplus to account for the nested nature of groups. Items were selected using clinical judgment and statistical judgment considering inter item correlation and factor loading. Model fit was analyzed in comparison to the standards in the literature and in comparison to the full length GQ. Discussion. The revised 12 item GQ has good model fit and acceptable reliability. Further assessment is needed to determine how the reduction affects clinical utility.

APA, Harvard, Vancouver, ISO, and other styles

18

Cohen, Michael S. "How do subjects use judgments of item difficulty to guide study strategies in selection of spaced or massed practice? a comparison theories /." Click here for download, 2007. http://proquest.umi.com/pqdweb?did=1338915391&sid=1&Fmt=2&clientId=3260&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Peabody, Michael R. "EFFECTS OF ITEM-LEVEL FEEDBACK ON THE RATINGS PROVIDED BY JUDGES IN A MODIFIED-ANGOFF STANDARD SETTING STUDY." UKnowledge, 2014. http://uknowledge.uky.edu/edsc_etds/2.

Full text

Abstract:

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations and although all cut score decisions are by nature arbitrary, they should not be capricious. Establishing a minimum passing standard is the technical expression of a policy decision and the information gained through standard setting studies inform these policy decisions. To this end, it is necessary to conduct robust examinations of methods and techniques commonly applied to standard setting studies in order to better understand issues that may influence policy decisions. The modified-Angoff method remains one of the most popular methods for setting performance standards in testing and assessment. With this method, is common practice to provide content experts with feedback regarding the item difficulties; however, it is unclear how this feedback affects the ratings and recommendations of content experts. Recent research seems to indicate mixed results, noting that the feedback given to raters may or may not alter their judgments depending on the type of data provided, when the data was provided, and how raters collaborated within groups and between groups. This research seeks to examine issues related to the effects of item-level feedback on the judgment of raters. The results suggest that the most important factor related to item-level feedback is whether or not a Subject Matter Expert (SME) was able to correctly answer a question. If so, then the SMEs tended to rely on their own inherent sense of item difficulty rather than the data provided, in spite of empirical evidence to the contrary. The results of this research may hold implications for how standard setting studies are conducted with regard to the difficulty and ordering of items, the ability level of content experts invited to participate in these studies, and the types of feedback provided.

APA, Harvard, Vancouver, ISO, and other styles

20

Andrade, Maria Florentina Alves Gomes Lopes de. "Exames nacionais: a influência da tipologia dos itens nos resultados das provas de biologia e geologia." Master's thesis, Universidade de Évora, 2012. http://hdl.handle.net/10174/14840.

Full text

Abstract:

Enquadrado na problemática dos exames nacionais, este estudo teve como principal objetivo verificar se a tipologia dos itens interfere nos resultados do exame nacional de Biologia e Geologia. Procurou-se, também, conhecer as perceções dos alunos relativamente ao grau de dificuldade dos itens dos exames nacionais realizados em 2011 e as razões dessas dificuldades. A recolha de dados foi feita por aplicação de um questionário a 106 alunos de quatro escolas do distrito de Évora e com base em documentos do GAVE, onde constam os resultados por item, de cada escola. Os dados revelam que os itens de resposta restrita são os que se apresentam com resultados médios mais baixos, sendo também percecionados como os mais difíceis. Os melhores resultados correspondem aos itens de associação/correspondência. As principais razões de dificuldade prendem-se com relacionar os conhecimentos com as informações fornecidas (textos/tabelas/gráficos/esquemas) e a complexidade da matéria; ABSTRACT: Framed on the issue of national exams, this study aimed to verify if the types of items affect the results of the national exam of Biology and Geology. We tried, also, to know the perceptions of students in relation to the degree of difficulty of the items of national exams conducted in 2011, and the reasons for these difficulties. Data collection was performed by administering a questionnaire to 106 students from four schools in the district of Évora and based on documents of GAVE, which contains the results by item, for each school. Data reveal that restricted response items had lower average results, and are also the most difficult in students’ perceptions. The best results correspond to items of association/correspondence. The main reasons of difficulty have to do with relate knowledge with the information provided (text/tables/charts/diagrams) and the complexity of the subject.

APA, Harvard, Vancouver, ISO, and other styles

21

Banda, Asiana. "ZAMBIAN PRE-SERVICE JUNIOR HIGH SCHOOL SCIENCE TEACHERS' CHEMICAL REASONING AND ABILITY." OpenSIUC, 2014. https://opensiuc.lib.siu.edu/dissertations/796.

Full text

Abstract:

The purpose of this study was two-fold: examine junior high school pre-service science teachers' chemical reasoning; and establish the extent to which the pre-service science teachers' chemical abilities explain their chemical reasoning. A sample comprised 165 junior high school pre-service science teachers at Mufulira College of Education in Zambia. There were 82 males and 83 females. Data were collected using a Chemical Concept Reasoning Test (CCRT). Pre-service science teachers' chemical reasoning was established through qualitative analysis of their responses to test items. The Rasch Model was used to determine the pre-service teachers' chemical abilities and item difficulty. Results show that most pre-service science teachers had incorrect chemical reasoning on chemical concepts assessed in this study. There was no significant difference in chemical understanding between the Full-Time and Distance Education pre-service science teachers, and between second and third year pre-service science teachers. However, there was a significant difference in chemical understanding between male and female pre-service science teachers. Male pre-service science teachers showed better chemical understanding than female pre-service science teachers. The Rasch model revealed that the pre-service science teachers had low chemical abilities, and the CCRT was very difficult for this group of pre-service science teachers. As such, their incorrect chemical reasoning was attributed to their low chemical abilities. These results have implications on science teacher education, chemistry teaching and learning, and chemical education research.

APA, Harvard, Vancouver, ISO, and other styles

22

Sim, Stacy. "An Item Response Theory Analysis of CWB Measurement Artifacts." Bowling Green State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1478003731122816.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Wang, Wenjia. "Item Response Theory in the Neurodegenerative Disease Data Analysis." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0624/document.

Full text

Abstract:

Les maladies neurodégénératives, telles que la maladie d'Alzheimer (AD) et Charcot Marie Tooth (CMT), sont des maladies complexes. Leurs mécanismes pathologiques ne sont toujours pas bien compris et les progrès dans la recherche et le développement de nouvelles thérapies potentielles modifiant la maladie sont lents. Les données catégorielles, comme les échelles de notation et les données sur les études d'association génomique (GWAS), sont largement utilisées dans les maladies neurodégénératives dans le diagnostic, la prédiction et le suivi de la progression. Il est important de comprendre et d'interpréter ces données correctement si nous voulons améliorer la recherche sur les maladies neurodégénératives. Le but de cette thèse est d'utiliser la théorie psychométrique moderne: théorie de la réponse d’item pour analyser ces données catégoriques afin de mieux comprendre les maladies neurodégénératives et de faciliter la recherche de médicaments correspondante. Tout d'abord, nous avons appliqué l'analyse de Rasch afin d'évaluer la validité du score de neuropathie Charcot-Marie-Tooth (CMTNS), un critère important d'évaluation principal pour les essais cliniques de la maladie de CMT. Nous avons ensuite adapté le modèle Rasch à l'analyse des associations génétiques pour identifier les gènes associés à la maladie d'Alzheimer. Cette méthode résume les génotypes catégoriques de plusieurs marqueurs génétiques tels que les polymorphisme nucléotidique (SNPs) en un seul score génétique. Enfin, nous avons calculé l'information mutuelle basée sur la théorie de réponse d’item pour sélectionner les items sensibles dans ADAS-cog, une mesure de fonctionnement cognitif la plus utilisées dans les études de la maladie d'Alzheimer, afin de mieux évaluer le progrès de la maladie
Neurodegenerative diseases, such as Alzheimer’s disease (AD) and Charcot Marie Tooth (CMT), are complex diseases. Their pathological mechanisms are still not well understood, and the progress in the research and development of new potential disease-modifying therapies is slow. Categorical data like rating scales and Genome-Wide Association Studies (GWAS) data are widely utilized in the neurodegenerative diseases in the diagnosis, prediction and progression monitor. It is important to understand and interpret these data correctly if we want to improve the disease research. The purpose of this thesis is to use the modern psychometric Item Response Theory to analyze these categorical data for better understanding the neurodegenerative diseases and facilitating the corresponding drug research. First, we applied the Rasch analysis in order to assess the validity of the Charcot-Marie-Tooth Neuropathy Score (CMTNS), a main endpoint for the CMT disease clinical trials. We then adapted the Rasch model to the analysis of genetic associations and used to identify genes associated with Alzheimer’s disease by summarizing the categorical genotypes of several genetic markers such as Single Nucleotide Polymorphisms (SNPs) into one genetic score. Finally, to select sensitive items in the most used psychometrical tests for Alzheimer’s disease, we calculated the mutual information based on the item response model to evaluate the sensitivity of each item on the ADAS-cog scale

APA, Harvard, Vancouver, ISO, and other styles

24

Brändström, Anna. "Differentiated tasks in mathematics textbooks : an analysis of the levels of difficulty." Licentiate thesis, Luleå, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-18110.

Full text

Abstract:

The aim of this work is to study differentiation in mathematics textbooks. Based on mathematics textbooks used in Sweden for year 7, the study is performed from the point of view that all students should be challenged and stimulated throughout their learning in compulsory school. Classroom studies and observations have shown textbooks to have a dominant role in mathematics education for both teachers and students. It is therefore important to study how tasks in textbooks are differentiated and how this can affect education in mathematics. The tasks are analysed with respect to their difficulty levels. The results of the study show that differentiation does occur in the textbooks tasks, but on a low difficulty level for all students regardless of their mathematical abilities. Besides this, the study shows the use of pictures to not have any differentiating role in the analysed tasks.

Godkänd; 2005; 20070102 (haneit)

APA, Harvard, Vancouver, ISO, and other styles

25

Morrison, Kristin M. "Impact of working memory burden and contextualization on cognitive complexity." Thesis, Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/47694.

Full text

Abstract:

Contextualization is often added to mathematical achievement items to place targeted mathematical operations in a real world context or in combinations with other mathematical skills. Such items may have unintended sources of difficulty, such as greater cognitive complexity than specified in the test blueprint. These types of items are being introduced to achievement exams through assessment programs such as SBAC and PARCC. Cognitive models have been created to assess sources of cognitive complexity in mathematics items, including a global model (Embretson&Daniel, 2008) and an adapted model (Lutz, Embretson,&Poggio, 2010). The current study proposes a new cognitive model structured around sources of working memory burden with an emphasis on contextualization. Full-information item response (IRT) models were applied to a state accountability test of mathematics achievement in middle school to examine impact on psychometric properties related to burden on working memory.

APA, Harvard, Vancouver, ISO, and other styles

26

黎寶欣 and Po-yan Lai. "Effect of visual item arrangement on search performance." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B3124189X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lai, Po-yan. "Effect of visual item arrangement on search performance." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23530212.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Lees, Jared Andrew. "Differential Item Functioning Analysis of the Herrmann Brain Dominance Instrument." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2103.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Chen, Dong Qi Kayla. "Gender-related differential item functioning analysis on the GEPT-kids." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3953512.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Lee, Jung-Jung. "ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE." CSUSB ScholarWorks, 2016. https://scholarworks.lib.csusb.edu/etd/391.

Full text

Abstract:

Item response theory (IRT) offers several advantages compared to classical test theory (CTT) in providing additional information on psychometric qualities of the scale. My goal was to demonstrate the superiority of IRT as compared to CTT through two analyses of the Top Leadership Direction scale (TLDS), which was created to measure the effectiveness of top leadership through the followers’ perceptions in the context of providing guidance of the organization. Furthermore, the participants (n = 8046) were the employees from various positions at 18 of the 23 California State University campuses. In the graded response model (GRM) analysis, the result showed that IRT provided more information about each item and allowed a useful visual inspection of the items. With the second analysis, I aimed to provide evidence of measurement equivalence across functional groups of employees using differential item functioning (DIF) analysis in IRT. Due to the lack of model fit, the DIF analysis was incomplete. A supplementary multigroup CFA was conducted to investigate the structural difference across the groups for the items of the TLDS. The result of multigroup CFA suggested that item 2 and item 4 did not show measurement equivalence across the groups at the construct level. An alternative model in IRT was discussed due to some limitations of GRM in the present study. Practical and theoretical implications for the use of IRT were also presented and contrasted with CTT.

APA, Harvard, Vancouver, ISO, and other styles

31

McBride, Nadine LeBarron. "An Item Response Theory Analysis of the Scales from the International Personality Item Pool and the NEO Personality Inventory-Revised." Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/34430.

Full text

Abstract:

Personality tests are widely used in the field of Industrial/Organizational Psychology; however, few studies have focused on their psychometric properties using Item Response Theory. This paper uses IRT to examine the test information functions (TIFs) of two personality measures: the NEO-PI-R and scales from the International Personality Item Pool. Results showed that most scales for both measures provided relatively consistent levels of information and measurement precision across levels of theta (q). Although the NEO-PI-R provided overall higher levels of information and measurement precision, the IPIP scales provided greater efficiency in that they provided more precision per item. Both scales showed substantial decrease in precision and information when response scales were dichotomized away from the original 5 point likert scale format. Implications and further avenues for research are discussed.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

32

Eryilmaz, Hande. "Analysis Of A Two-echelon Multi-item Inventory System With Postponement." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/2/12611503/index.pdf.

Full text

Abstract:

Increased product proliferation and global competition are forcing companies within the supply chain to adopt new strategies. Postponement is an effective strategy that allows companies to be agile and cost effective in dealing with the dynamics of global supply chains. Postponement centres around delaying activities in the supply chain until real information about the market is available, which reduces the complexity and uncertainty of dealing with a proliferation of products. A two-echelon divergent supply chain entailing a central production facility and N retailers facing stochastic demand is studied within the inventory-distribution system. A periodic review order-up-to strategy is incorporated at all echelons. Unique to the study, five different systems are created and the effectiveness of several postponement strategies (form and transshipment) under various operational settings are compared. The importance of postponement under an integrated supply chain context and its contribution to various sector implementations are also discussed. Simulation is used to analyze the performance of the systems especially with respect to cost, order lead time and the effectiveness of transshipment policies. The study is unique in determining factors that favour one system implementation over another and distinguishing sector requirements that support postponement. In the study, postponement is found to be an effective strategy in dealing with managing item variety, demand uncertainty and differences in review periods in the two echelon supply chain for different experimental settings.

APA, Harvard, Vancouver, ISO, and other styles

33

Presnall-Shvorin, Jennifer R. "THE FIVE-FACTOR OBSESSIVE-COMPULSIVE INVENTORY: AN ITEM RESPONSE THEORY ANALYSIS." UKnowledge, 2015. http://uknowledge.uky.edu/psychology_etds/56.

Full text

Abstract:

Arguments have been made for dimensional models over categorical for the classification of personality disorder, and for the five-factor model (FFM) in particular. A criticism of the FFM of personality disorder is the absence of measures designed to assess pathological personality. Several measures have been developed based on the FFM to assess the maladaptive personality traits included within existing personality disorders. One such example is the Five-Factor Obsessive-Compulsive Inventory (FFOCI). The current study applied item response theory analyses (IRT) to test whether scales of the FFOCI are extreme variants of respective FFM facet scales. It was predicted that both the height and slope of the item-response curves would differ for the conscientiousness-based scales, due to the bias towards assessing high conscientiousness as adaptive in general personality inventories (such as Goldberg’s International Personality Item Pool; IPIP). Alternatively, the remaining FFOCI scales and their IPIP counterparts were predicted to demonstrate no significant differences in IRCs across theta. Nine hundred and seventy-two adults each completed the FFOCI and the IPIP, including 377 undergraduate students and 595 participants recruited online. A portion of the results supported the hypotheses, with select exceptions. Fastidiousness and Workaholism demonstrated the expected trends, with the FFOCI providing higher levels of fidelity at the higher end of theta, and the IPIP demonstrating superior coverage at the lower end of theta. Other conscientiousness scales failed to demonstrate the expected differences at a statistically significant level. In this context, the suitability of IRT in the analysis of rationally-derived, polytomous scales is explored.

APA, Harvard, Vancouver, ISO, and other styles

34

Redfern, Andrew, Erik Nelson, and Matthew White. "Price analysis on commercial item purchases within the Department of Defense." Thesis, Monterey, California: Naval Postgraduate School, 2013. http://hdl.handle.net/10945/37743.

Full text

Abstract:

Approved for public release; distribution is unlimited
Proficiency in completing price reasonableness determinations and documenting the contracting file properly is developed based on experience and completion of required contract pricing courses provided through the Defense Acquisition Workforce Improvement Act (DAWIA) certification process. As there is a wide range of skill levels within the contracting community, it is possible that employees surveyed may not have attended the required contracting pricing courses, or developed the skills required to properly complete price reasonableness determinations.

APA, Harvard, Vancouver, ISO, and other styles

35

Russell, Joseph F. "Analysis of commercial pricing factors : a framework for commercial item pricing." Thesis, Monterey, Calif. Naval Postgraduate School, 2002. http://hdl.handle.net/10945/6028.

Full text

Abstract:

Approved for public release; distribution is unlimited.
Recent procurement reform initiatives within the Federal Government have served to significantly reduce the requirement for offerors to provide the Government with cost or pricing data in advance of contract negotiations. The goal of these initiatives is to streamline the procurement process and achieve a procurement environment that more closely resembles the practices of the commercial sector. In order for the Government Contracting Officer to effectively analyze an offer as fair and reasonable and obtain a negotiating position, the Contracting Officer must recognize and understand a myriad of elements that contribute to a commercial firm's pricing objectives. The purpose of this research is to examine the elements that influence a contractor's pricing as well as the factors applied to their purchasing decisions. This paper will present data that can be analyzed without the benefit of cost or pricing data. The thesis provides a framework for Government Contracting Officers to recognize and analyze this data in preparing for contract negotiations.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Mo. "Gender related differential item functioning in mathematics tests a meta-analysis /." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Thesis/Summer2009/m_zhang_072109.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Siow, Christopher (Christopher Shun Yi). "Analysis of batching strategies for multi-item production with yield uncertainty." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/43093.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2008.
Includes bibliographical references (p. 179-180).
In this thesis, we investigate the batch sizing problem for a custom-job production facility. More specifically, given a production system that has been assigned several different types of custom jobs, we try to derive batching policies to minimize the expected total time that a job spends in the system. Custom-job production brings a host of challenges that makes batch sizing very difficult - production can only begin when the order arrives, the yield uncertainty probabilities are fairly large, and the production quantities are typically small. Furthermore, deriving an optimal batch sizing policy is difficult due to the heterogeneity of the job types; each job type has a different demand, batch setup time, unit production rate, unit defective probability, and job arrival rate. In addition, further complexity stems from the fact that the batch sizing decisions for each job type are coupled, and cannot be made independently. Given the difficulties in selecting the batch sizes, we propose an alternative batching method that minimizes the system utilization instead of the expected total job time. The main advantage of this approach is that is allows us to choose the batch size of each job type individually. First, we model the system as an M/G/l queue, and obtain a closed-form expression for the expected total job time when the demand is restricted to be a single unit. Following which, we show empirically that the minimum utilization heuristic attains near-optimal performance under the unit demand restriction. We then build on this analysis, and extend the heuristic to the general case in which the demand of each job is allowed to be more than a single unit. Finally, we use simulations to compare our heuristic against other alternative batching policies, and the results indicate that our heuristic is indeed an effective strategy.
by Christopher Siow.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

38

Gurell, Seth Michael. "Measuring the Technical Difficulty in Reusing Open Educational Resources with the ALMS Analysis Framework." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3472.

Full text

Abstract:

The Open Educational Resources (OER) movement was started roughly ten years old (Wiley & Gurell, 2009). Since that time thousands of resources have been produced. Though these resources have been used both for classroom development and for the autodidact, the development of OER was not without problems. Incompatibility between Creative Commons licenses has made revising and remixing two resources difficult, if not impossible (Linksvayer, 2006). Tools to help educators find appropriate educational resources have been necessary but are still nascent. Educators' perceived quality issues have also hampered adoption (Wiley & Gurell, 2009). The result is that resources were only being minimally reused (Wiley, 2009). One possible reason observed for the limited reuse was the barrier of technology. Some resources were easier to view, revise and remix from a technical perspective than others. Hilton, Wiley, Stein, and Johnson (2010) created the ALMS analysis framework to assess the technical openness of an OER. Although the ALMS framework allowed for an assessment of OER, no pilot instrument was reported in the Hilton et al. (2010) article. The framework has not been tested because there is no known rubric with which measurement can occur. Consequently, Hilton et al.'s framework needed to be further developed and tested against a range of open educational resources. This dissertation examined the ALMS analysis, which was previously only a concept, in order to create a concrete framework with sufficient detail and documentation for comparisons to be made among OERs. The rubric was further refined through a Delphi study consisting of experts in the field of OER (n=5). A sample of OERs (n=27) rated by a small group (4) was conducted to determine inter-rater reliability. Intra-class correlation indicated moderate agreement (ICC(2,1) =.655, df=376, 95% CI [.609, .699]). Findings suggested that the degree of technical difficulty in reusing OERs can be measured in somewhat reliable manner. These findings may be insightful in developing policies and practices regarding OER development.

APA, Harvard, Vancouver, ISO, and other styles

39

Mehta, Vandhana. "Structural Validity and Item Functioning of the LoTi Digital-Age Survey." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc68014/.

Full text

Abstract:

The present study examined the structural construct validity of the LoTi Digital-Age Survey, a measure of teacher instructional practices with technology in the classroom. Teacher responses (N = 2840) from across the United States were used to assess factor structure of the instrument using both exploratory and confirmatory analyses. Parallel analysis suggests retaining a five-factor solution compared to the MAP test that suggests retaining a three-factor solution. Both analyses (EFA and CFA) indicate that changes need to be made to the current factor structure of the survey. The last two factors were composed of items that did not cover or accurately measure the content of the latent trait. Problematic items, such as items with crossloadings, were discussed. Suggestions were provided to improve the factor structure, items, and scale of the survey.

APA, Harvard, Vancouver, ISO, and other styles

40

Posada, Jarred L., and David E. Caballero. "Item unique identification capability expansion: established process analysis, cost benefit analysis, and optimal marking procedures." Thesis, Monterey, California: Naval Postgraduate School, 2014. http://hdl.handle.net/10945/44647.

Full text

Abstract:

Approved for public release; distribution is unlimited
The purpose of this Master of Business Administration project is to identify possible expansion capabilities, by researching the most cost-effective two-dimensional barcode technology known as an item unique identification that will allow for tracking Department of the Navy assets from cradle to grave. While the Navy is not 100 percent complete, the Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics mandated that all new tangible and legacy items over $5,000 and/or serially managed, mission critical, or controlled by inventory, must be serialized and registered by 2010. There are two methods that the Navy can use to mark such items: intrusive and nonintrusive. For legacy items, the best method to mark an item would be nonintrusive, due to the criticality of maintaining the integrity of the item for safety reasons. Thus, it was determined that the best marking procedure for legacy items would be metal foil tags, generated by a contracting company, since they are the most cost-effective, nonintrusive marking method.

APA, Harvard, Vancouver, ISO, and other styles

41

Fragoso, Tiago de Miranda. "Modelos multidimensionais da teoria de resposta ao item." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-21092010-113121/.

Full text

Abstract:

Avaliaçõs educacionais, de distúrbios psicológicos e da aceitação de um produto no mercado são exemplos de estudos que buscam quantificar um construto de interesse através de questionários compostos por itens de múltipla escolha. A Teoria de Resposta ao Item (TRI) é muito utilizada na análise de dados provenientes da aplicação desses questionários. Há vários modelos da TRI já muito utilizados na prática com tal finalidade, tanto para respostas dicotômicas aos itens (certo/errado, presente/ausente, sim/não), quanto para itens com mais de duas categorias de resposta (nominais ou ordinais). No entanto, a grande maioria supôe que apenas um traço latente é necessário para explicar a probabilidade de resposta ao item (modelos unidimensionais). Como as situações práticas são usualmente caracterizadas por várias aptidões (traços latentes) influenciando a probabilidade de um indivíduo apresentar certa resposta ao item, os modelos multidimensionais são de grande importância. Neste trabalho, após um levantamento bibliográfico dos principais modelos multidimensionais da TRI existentes na literatura, realizou-se um estudo detalhado de um deles: o modelo logístico multidimensional de dois parâmetros. O método de estimação dos parâmetros dos itens por máxima verossimilhança marginal e dos traços latentes por máxima verossimilhança são explicitados assim como a estimação por métodos bayesianos. Todos os métodos foram implementados em R, comparados e aplicados a um conjunto de dados reais para avaliação do Inventário de Depressão de Beck (BDI) e do Exame Nacional do Ensino Médio (ENEM)
Educational evaluations, psychological testing and market surveys are examples of studies aiming to quantify an underlying construct of interest through multiple choice item tests. Item Response Theory (IRT) is a class of models used to analyse such data. There are several IRT models already being used in applied studies to such end, either for dichotomical answers (right/wrong, present/ absent, Yes/No) or for itens with nominal or ordinal answers. However, the large majority of those models make the assumption that only one latent trait is sufficient to explain the probability of a correct answer to an item (unidimensional models). Since many situations in practice are characterized by multiple aptitudes (latent traits) in uencing such probabilities, multidimensional models that take such traits into consideration gain great importance. In the present work, after a thorough review of the litterature regarding multidimensional IRT models, we studied in depth one model: the two parameter multidimensional logistic model for dichotomical items. The marginal maximum likelihood method used to estimate the item parameters and the maximum likelihood method used for the latent traits as well as bayesian methods for parameter estimation were studied, compared, implemented in the R software and then applied to a real dataset to infere depression using the Beck Depression Inventory(BDI)and the Exame Nacional do Ensino Médio (ENEM)

APA, Harvard, Vancouver, ISO, and other styles

42

Wallot, Sebastian. "The role of reading fluency, text difficulty and prior knowledge in complex reading tasks." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1321370968.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Hong, Ching-Ping, and 洪至評. "The Conditional Item Difficulty Scale Analysis." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/qpest4.

Full text

Abstract:

碩士
中原大學
應用數學研究所
92
Compared with the traditional methods, the Rasch model considers the latent trait of human ability and the characteristics of item independently. The advantage of using the Rasch model is that its analysis is closer to the real situation. In this research, the Rasch model is extended to the multinomial case. The method of estimated parameters, which is general used, applies interation. Therefore, it is necessary to design a different program for each model. We show that the functional form of the conditional model is similar to a multinimial logit model which has an equivalent loglinear model form. We have found that the functional form of model similar to that of uaual multinomial logit model. Therefore, the item difficulty can be estimated by using the generalized linear modeln program. The advantage of using linear model is that it does not require a new software or program to analyze the data, i.e. general statistial software can be used. Moreover, the method discussed in this research also can be used to analyze different models which the functional forms are similar to that of multinomial logit model. Finally, We hope the item difficulty calculated by the research will help teachers when they design tests.

APA, Harvard, Vancouver, ISO, and other styles

44

Sanderson, Penelope Jane. "Multiple-choice questions : linguistic investigation of difficulty for first-language and second-language students." Thesis, 2010. http://hdl.handle.net/10500/4836.

Full text

Abstract:

Multiple-choice questions are acknowledged to be difficult for both English mother-tongue and second-language university students to interpret and answer. In a context in which university tuition policies are demanding explicitly that assessments need to be designed and administered in such a way that no students are disadvantaged by the assessment process, the thesis explores the fairness of multiple-choice questions as a way of testing second-language students in South Africa. It explores the extent to which two multiple-choice Linguistics examinations at Unisa are in fact ‘generally accessible’ to second-language students, focusing on what kinds of multiple-choice questions present particular problems for second-language speakers and what contribution linguistic factors make to these difficulties. Statistical analysis of the examination results of two classes of students writing multiple-choice exams in first-year Linguistics is coupled with a linguistic analysis of the examination papers to establish the readability level of each question and whether the questions adhered to eight item-writing guidelines relating to maximising readability and avoiding negatives, long items, incomplete sentence stems, similar answer choices, grammatically non-parallel answer choices, ‘All-of-the-above’ and ‘None-of-the-above’ items. Correlations are sought between question difficulty and aspects of the language of these questions and an attempt is made to investigate the respective contributions of cognitive difficulty and linguistic difficulty on student performance. To complement the quantitative portion of the study, a think-aloud protocol was conducted with 13 students in an attempt to gain insight into the problems experienced by individual students in reading, understanding and answering multiple-choice questions. The consolidated quantitative and qualitative findings indicate that among the linguistic aspects of questions that contributed to question difficulty for second language speakers was a high density of academic words, long items and negative stems. These sources of difficulty should be addressed as far as possible during item-writing and editorial review of questions.

APA, Harvard, Vancouver, ISO, and other styles

45

Bekwa, Nomvuyo Nomfusi. "The development and evaluation of Africanised items for multicultural cognitive assessment." Thesis, 2016. http://hdl.handle.net/10500/23591.

Full text

Abstract:

Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. Marie Curie Debates about how best to test people from different contexts and backgrounds continue to hold the spotlight of testing and assessment. In an effort to contribute to the debates, the purpose of the study was to develop and evaluate the viability and utility of nonverbal figural reasoning ability items that were developed based on inspirations from African cultural artefacts such as African material prints, art, decorations, beadwork, paintings, et cetera. The research was conducted in two phases, with phase 1 focused on the development of the new items, while phase 2 was used to evaluate the new items. The aims of the study were to develop items inspired by African art and cultural artefacts in order to measure general nonverbal figural reasoning ability; to evaluate the viability of the items in terms of their appropriateness in representing the African art and cultural artefacts, specifically to determine the face and content validity of the items from a cultural perspective; and to evaluate the utility of the items in terms of their psychometric properties. These elements were investigated using the exploratory sequential mixed method research design with quantitative embedded in phase 2. For sampling purposes, the sequential mixed method sampling design and non-probability sampling strategies were used, specifically the purposive and convenience sampling methods. The data collection methods that were used included interviews with a cultural expert and colour-blind person, open-ended questionnaires completed by school learners and test administration to a group of 946 participants undergoing a sponsored basic career-related training and guidance programme. Content analysis was used for the qualitative data while statistical analysis mainly based on the Rasch model was utilised for quantitative data. The results of phase 1 were positive and provided support for further development of the new items, and based on this feedback, 200 new items were developed. This final pool of items was then used for phase 2 – the evaluation of the new items. The v statistical analysis of the new items indicated acceptable psychometric properties of the general reasoning (“g” or fluid ability) construct. The item difficulty values (pvalues) for the new items were determined using classical test theory (CTT) analysis and ranged from 0.06 (most difficult item) to 0.91 (easiest item). Rasch analysis showed that the new items were unidimensional and that they were adequately targeted to the level of ability of the participants, although there were elements that would need to be improved. The reliability of the new items was determined using the Cronbach alpha reliability coefficient (α) and the person separation index (PSI), and both methods indicated similar indices of internal consistency (α = 0.97; PSI = 0.96). Gender-related differential item functioning (DIF) was investigated, and the majority of the new items did not indicate any significant differences between the gender groups. Construct validity was determined from the relationship between the new items and the Learning Potential Computerised Adaptive Test (LPCAT), which uses traditional item formats to measure fluid ability. The correlation results for the total score of the new items and the pre- and post-tests were 0.616 and 0.712 respectively. The new items were thus confirmed to be measuring fluid ability using nonverbal figural reasoning ability items. Overall, the results were satisfactory in indicating the viability and utility of the new items. The main limitation of the research was that because the sample was not representative of the South African population, there were limited for generalisation. This led to a further limitation, namely that it was not possible to conduct important analysis on DIF for various other subgroups. Further research has been recommended to build on this initiative.
Industrial and Organisational Psychology

APA, Harvard, Vancouver, ISO, and other styles

46

Yurecko, Michele. "Investigating the relationship between reading achievement, and state-level ecological variables and educational reform a hierarchical analysis of item difficulty variation /." 2009. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.000051083.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Lin, Pei-Ying. "Setting Accommodation and Item Difficulty." Thesis, 2012. http://hdl.handle.net/1807/32813.

Full text

Abstract:

This study used multilevel measurement modeling to examine the differential difficulties of math and reading items for Grade 6 students participating in Ontario’s provincial assessment in 2005-2006, in relation to whether they received a setting accommodation, had a learning disability (LD), and spoke a language in addition to English. Both differences in difficulty between groups of students for all items (impact) and for individual items (differential item functioning) were examined. Students’ language backgrounds (whether they spoke a language in addition to English) were not significantly related to item difficulty. Compared to non-accommodated students with LD, math and reading items were relatively difficult for accommodated students with LD. Moreover, the difference in overall impact on math items was larger than on reading items for accommodated and non-accommodated students with LD. Overall, students without LD and who did not receive a setting accommodation outperformed students with LD and/or who received a setting accommodation as well as accommodated students without LD. It is important to note that, because this was an operational test administration, students were assigned to receive accommodations by their schools based on their individual needs. It is, therefore, not possible to separate the effect of the setting accommodation on item difficulty from the effects of other differences between the accommodated and non-accommodated groups. The differences in math and reading item difficulties between accommodated and non-accommodated students with LD may be due in part to factors such as comorbidity of LD and attention deficit hyperactivity disorder (ADHD) or a possible mismatch between the setting accommodation and the areas of disabilities. Moreover, the results of the present study support the underarousal/optimal stimulation hypothesis instead of the premise of the inhibitory control and attention for the use of setting accommodation. After controlling for the impact across all items of setting accommodation and LD, several math and reading items were found to exhibit differential item functioning (DIF). The possible sources of DIF were (1) math items that were not adherent to specific item-writing rules and (2) reading items targeting different types of comprehension. This study also found that the linguistic features of math items (total words, total sentences, average word length, monosyllabic words for math) and reading items (word frequency, average sentence length, and average words per sentence for reading) were associated with math and reading item difficulties for students with different characteristics. The total sentences and average word length in a math item as well as total words in a reading item significantly predicted the achievement gap between groups. Therefore, the linguistic features should be taken into account when assessments are developed and validated for examinees with varied characteristics.

APA, Harvard, Vancouver, ISO, and other styles

48

"Nonword Item Generation: Predicting Item Difficulty in Nonword Repetition." Master's thesis, 2011. http://hdl.handle.net/2286/R.I.14380.

Full text

Abstract:

abstract: The current study employs item difficulty modeling procedures to evaluate the feasibility of potential generative item features for nonword repetition. Specifically, the extent to which the manipulated item features affect the theoretical mechanisms that underlie nonword repetition accuracy was estimated. Generative item features were based on the phonological loop component of Baddelely's model of working memory which addresses phonological short-term memory (Baddeley, 2000, 2003; Baddeley & Hitch, 1974). Using researcher developed software, nonwords were generated to adhere to the phonological constraints of Spanish. Thirty-six nonwords were chosen based on the set item features identified by the proposed cognitive processing model. Using a planned missing data design, two-hundred fifteen Spanish-English bilingual children were administered 24 of the 36 generated nonwords. Multiple regression and explanatory item response modeling techniques (e.g., linear logistic test model, LLTM; Fischer, 1973) were used to estimate the impact of item features on item difficulty. The final LLTM included three item radicals and two item incidentals. Results indicated that the LLTM predicted item difficulties were highly correlated with the Rasch item difficulties (r = .89) and accounted for a substantial amount of the variance in item difficulty (R2 = .79). The findings are discussed in terms of validity evidence in support of using the phonological loop component of Baddeley's model (2000) as a cognitive processing model for nonword repetition items and the feasibility of using the proposed radical structure as an item blueprint for the future generation of nonword repetition items.
Dissertation/Thesis
M.A. Educational Psychology 2011

APA, Harvard, Vancouver, ISO, and other styles

49

Hsiao, Meng-Ting, and 蕭孟莛. "The Investigation of Item Difficulty of Pentomino Combination Tasks." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/d4qkjn.

Full text

Abstract:

碩士
國立臺灣科技大學
技術及職業教育研究所
95
The purpose of the study is to investigate the item difficulty of the Pentomino combination tasks which appear in the Figural subscale of Scholastic Aptitude Test (SAT) developed by College Entrance Examination Center (CEEC). The population of the SAT is Taiwan’s first-year senior high school students. The subscale was administered in January of 2007, and its norm was constituted of 1836 students, with 878 boys and 958 girls. According to item characteristics ‘number of Pentominoes’ and ‘size of maximum complete rectangles’, the study conducted analyses of t-test and two-way ANOVA. The research results show that items are more difficult using more Pentominoes, because combinations and solutions are more complicated; items are more difficult when the size of maximum complete rectangles is larger, because there is less hint targets. The test scores interact with the number of Pentominoes and the size of maximum rectangles. The high score group scored about the same no matter the number of Pentominoes and the size of maximum rectangles. It is suggested that qualitative research can be conducted to the high score group for further investigation. Later version of the figural subscale can increase item difficulty with more Pentominoes and larger rectangles. There are other possible factors to influence item difficulty, such as transformation strategy and the difficulty of each Pentomino, can be considered into future test revision. Restricted by paper-and-pencil, some complicated spatial items are not easily drawn and shown in a 2D way, as a result, authentic spatial ability can not be tested adequately. It is suggested that more performance tests and computerized tests be developed to help investigate the construct of spatial ability, and the spatial theory and tests can be established to increase our understanding of the spatial ability and application.

APA, Harvard, Vancouver, ISO, and other styles

50

Lee, Chu-yuan, and 李主媛. "Exploring the Cognitive Components of Item Difficulty on English Comprehension Test." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/80640299587136105256.

Full text

Abstract:

碩士
國立臺南大學
測驗統計研究所碩士班
97
The purpose of this study is to propose an analysis framework for item cognitive complexity on English comprehension test. The cognitive components contrast of different difficulty level items is also discussed. The 88 items of reading comprehension of the Gifted Students Screening Assessment-English （GISA-ENG） for 7th graders were used for this analysis. A 5 cognitive components coding framework（plausible distractors, integrated description, degree of inference, numbers of stem words, and negative wording of stem）was developed to predict the item difficulty parameters. There is no significant difference between the Rasch Model and the Rasch Testlet Model for the data. The results suggest that the framework proposed can predict around 58.3% of the difficulty variance. The main differences between basic and proficient level students are the performance on the easier items of the proficient level. Most of these items have no obvious plausible distractors and didn’t need to make inferences. In these items, 85% proficient level students answer correctly, but only 36% basic level students answer correctly. The implications of these results for only standards definition are discussed.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Analysis of item difficulty'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles