Dissertations / Theses: 'Differential Item Functioning (DIF)'

1

Lee, Yoonsun. "The impact of a multidimensional item on differential item functioning (DIF) /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/7920.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yildirim, Huseyin Husnu. "The Differential Item Functioning (dif) Analysis Of Mathematics Items In The International Assessment Programs." Phd thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607135/index.pdf.

Full text

Abstract:

Cross-cultural studies, like TIMSS and PISA 2003, are being conducted since 1960s with an idea that these assessments can provide a broad perspective for evaluating and improving education. In addition countries can assess their relative positions in mathematics achievement among their competitors in the global world. However, because of the different cultural and language settings of different countries, these international tests may not be functioning as expected across all the countries. Thus, tests may not be equivalent, or fair, linguistically and culturally across the participating countries. In this conte! ! xt, the present study aimed at assessing the equivalence of mathematics items of TIMSS 1999 and PISA 2003 across cultures and languages, to fin! d out if mathematics achievement possesses any culture specifi! c aspect s. For this purpose, the present study assessed Turkish and English versions of TIMSS 1999 and PISA 2003 mathematics items with respect to, (a) psychometric characteristics of items, and (b) possible sources of Differential Item Functioning (DIF) between these two versions. The study used Restricted Factor Analysis, Mantel-Haenzsel Statistics and Item Response Theory Likelihood Ratio methodologies to determine DIF items. The results revealed that there were adaptation problems in both TIMSS and PISA studies. However it was still possible to determine a subtest of items functioning fairly between cultures, to form a basis for a cross-cultural comparison. In PISA, there was a high rate of agreement among the DIF methodologies used. However, in TIMSS, the agree! ment ra! te decreased considerably possibly because the rate o! f differ e! ntially functioning items within TIMSS was higher, and differential guessing and differential discriminating were also issues in the test. The study! also revealed that items requiring competencies of reproduction of practiced knowledge, knowledge of facts, performance of routine procedures, application of technical skills were less likely to be biased against Turkish students with respect to American students at the same ability level. On the other hand, items requiring students to communicate mathematically, items where various results must be compared, and items that had real-world context were less likely to be in favor of Turkish students.

APA, Harvard, Vancouver, ISO, and other styles

3

Lees, Jared Andrew. "Differential Item Functioning Analysis of the Herrmann Brain Dominance Instrument." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2103.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Duncan, Susan Cromwell. "Improving the prediction of differential item functioning: a comparison of the use of an effect size for logistic regression DIF and Mantel-Haenszel DIF methods." Diss., Texas A&M University, 2003. http://hdl.handle.net/1969.1/5876.

Full text

Abstract:

Psychometricians and test developers use DIF analysis to determine if there is possible bias in a given test item. This study examines the conditions under which two predominant methods for determining differential item function compare with each other in item bias detection using an effect size statistic as the basis for comparison. The main focus of the present research was to test whether or not incorporating an effect size for LR DIF will more accurately detect DIF and to compare the utility of an effect size index across MH DIF and LR DIF methods. A simulation study was used to compare the accuracy of MH DIF and LR DIF methods using a p value or supplemented with an effect size. Effect sizes were found to increase the accuracy of DIF and the possibility of the detection of DIF across varying ability distributions, population distributions, and sample size combinations. Varying ability distributions and sample size combinations affected the detection of DIF, while population distributions did not seem to affect the detection of DIF.

APA, Harvard, Vancouver, ISO, and other styles

5

Clark, Patrick Carl Jr. "An Examination of Type I Errors and Power for Two Differential Item Functioning Indices." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1284475420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Swander, Carl Joseph. "Assessing the Differential Functioning of Items and Tests of a Polytomous Employee Attitude Survey." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/9863.

Full text

Abstract:

Dimensions of a polytomous employee attitude survey were examined for the presence of differential item functioning (DIF) and differential test functioning (DTF) utilizing Raju, van der Linden, & Fleer's (1995) differential functioning of items and tests (DFIT) framework. Comparisons were made between managers and non-managers on the 'Management' dimension and between medical staff and nurse staff employees on both the 'Management' and 'Quality of Care and Service' dimensions. 2 out of 21 items from the manager/non-manager comparison were found to have significant DIF, supporting the generalizability of Lynch, Barnes-Farell, and Kulikowich (1998). No items from the medical staff/nurse staff comparisons were found to have DIF. The DTF results indicated that in two out of the three comparisons 1 item could be removed to create dimensions free from DTF. Based on the current findings implications and future research are discussed.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

7

Conoley, Colleen Adele. "Differential item functioning in the Peabody Picture Vocabulary Test - Third Edition: partial correlation versus expert judgment." Diss., Texas A&M University, 2003. http://hdl.handle.net/1969.1/151.

Full text

Abstract:

This study had three purposes: (1) to identify differential item functioning (DIF) on the PPVT-III (Forms A & B) using a partial correlation method, (2) to find a consistent pattern in items identified as underestimating ability in each ethnic minority group, and (3) to compare findings from an expert judgment method and a partial correlation method. Hispanic, African American, and white subjects for the study were provided by American Guidance Service (AGS) from the standardization sample of the PPVT-III; English language learners (ELL) of Mexican descent were recruited from school districts in Central and South Texas. Content raters were all self-selected volunteers, each had advanced degrees, a career in education, and no special expertise of ELL or ethnic minorities. Two groups of teachers participated as judges for this study. The "expert" group was selected because of their special knowledge of ELL students of Mexican descent. The control group was all regular education teachers with limited exposure to ELL. Using the partial correlation method, DIF was detected within each group comparison. In all cases except with the ELL on form A of the PPVT-III, there were no significant differences in numbers of items found to have significant positive correlations versus significant negative correlations. On form A, the ELL group comparison indicated more items with negative correlation than positive correlation [χ2 (1) = 5.538; p=.019]. Among the items flagged as underestimating ability of the ELL group, no consistent trend could be detected. Also, it was found that none of the expert judges could adequately predict those items that would underestimate ability for the ELL group, despite expertise. Discussion includes possible consequences of item placement and recommendations regarding further research and use of the PPVT-III.

APA, Harvard, Vancouver, ISO, and other styles

8

Asil, Mustafa. "Differential item functioning (DIF) analysis of the verbal section of the 2003 student selection examination (SSE)." The Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=osu1399553097.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Kim, Jihye. "Controlling Type 1 Error Rate in Evaluating Differential Item Functioning for Four DIF Methods: Use of Three Procedures for Adjustment of Multiple Item Testing." Digital Archive @ GSU, 2010. http://digitalarchive.gsu.edu/eps_diss/67.

Full text

Abstract:

In DIF studies, a Type I error refers to the mistake of identifying non-DIF items as DIF items, and a Type I error rate refers to the proportion of Type I errors in a simulation study. The possibility of making a Type I error in DIF studies is always present and high possibility of making such an error can weaken the validity of the assessment. Therefore, the quality of a test assessment is related to a Type I error rate and to how to control such a rate. Current DIF studies regarding a Type I error rate have found that the latter rate can be affected by several factors, such as test length, sample size, test group size, group mean difference, group standard deviation difference, and an underlying model. This study focused on another undiscovered factor that may affect a Type I error rate; the effect of multiple testing. DIF analysis conducts multiple significance testing of items in a test, and such multiple testing may increase the possibility of making a Type I error at least once. The main goal of this dissertation was to investigate how to control a Type I error rate using adjustment procedures for multiple testing which have been widely used in applied statistics but rarely used in DIF studies. In the simulation study, four DIF methods were performed under a total of 36 testing conditions; the methods were the Mantel-Haenszel method, the logistic regression procedure, the Differential Functioning Item and Test framework, and the Lord’s chi-square test. Then the Bonferroni correction, the Holm’s procedure, and the BH method were applied as an adjustment of multiple significance testing. The results of this study showed the effectiveness of three adjustment procedures in controlling a Type I error rate.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, Jing. "Contextual Differential Item Functioning: Examining the Validity of Teaching Self-Efficacy Instruments Using Hierarchical Generalized Linear Modeling." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339551861.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

De, Bruin IIse. "Exploring how objects used in a Picture Vocabulary Test influence validity." Diss., University of Pretoria, 2010. http://hdl.handle.net/2263/25218.

Full text

Abstract:

Multilingualism in the classroom is one of the many challenges found in the cumbersome bag that the South African education system is carrying over its shoulders at present. Globalisation and migration have added to the burden as factors adding further diversity to the already diverse classroom. In South Africa the spotlight is focused on equality. Equality is expected in the education system, and in the classroom and especially in tests. With 11 official languages excluding the additional languages from foreign learners it has become a daunting task to create tests that are fair across multilingual learners in one classroom. Items in tests that function differently from one group to another can provide biased marks. An investigation was done in order to detect any biased items present in a Picture Vocabulary Test. The study was lead by the main research question being: How do objects used in a Picture Vocabulary Test influence the level of validity? The first sub research question was: How do objects used in a Picture Vocabulary Test influence the level of validity? The next sub question was: To what extent is an undimensional trait measured by a Picture Vocabulary Test? The final subquestion was To what extent do the items in a Picture Vocabulary Test perform the same for the different language groups? This Picture Vocabulary Test was administered to Grade 1 learners in Afrikaans, English or Sepedi speaking schools within Pretoria, Gauteng. The sample totalling 1361 learners. The process involved a statistical procedure known as Rasch analyses. With the help of Rasch a Differential Item Functioning (DIF) analysis was done to investigate whether biased items were present in the test. The aim of this study it is to create greater awareness as to how biased items in tests can be detected and resolved. The results showed that the items in the Picture Vocabulary Test all tested vocabulary. Although items were detected that did indeed perform differently across the three language groups participating in the study.
Dissertation (MEd)--University of Pretoria, 2010.
Science, Mathematics and Technology Education
unrestricted

APA, Harvard, Vancouver, ISO, and other styles

12

Ramstedt, Kristian. "Elektriska flickor och mekaniska pojkar : Om gruppskillnader på prov - en metodutveckling och en studie av skillnader mellan flickor och pojkar på centrala prov i fysik." Doctoral thesis, Umeå universitet, Pedagogiska institutionen, 1996. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-16582.

Full text

Abstract:

This dissertation served two purposes. The first was to develop a method of detecting differential item functioning (DIF) within tests containing both dichotomously and polytomously scored items. The second was related to gender and aimed a) to investigate if those items that were functioning differently for girls and boys showed any characteristic properties and, if so, b) determine if these properties could be used to predict which items would be flagged for D1F. The method development was based on the Mantel-Haenszel (MH) method used for dichotmously scored items. By dichotomizing the polytomously scored items both types of item could be compared on the same statistical level as either solved or non-solved items. It was not possible to compare the internal score structures for the two gender groups, only overall score differences were detected. By modelling the empirical item characteristic curves it was possible to develop a MH method for identifying nonuniform DIF. Both internal and external ability criteria were used. Total test score with no purification was used as the internal criterion. Purification was not done for validity reasons, no items were judged as biased. Teacher set marks were used as external criteria. The marking scale had to be transformed for either boys or girls since a comparison of scores for boys and girls with the same marks showed that boys always got higher mean scores. The results of the two MH analyses based on internal and external criterion were compared with results from P-SIBTEST. All three methods corresponded well although P-SIBTEST flagged considerably more items in favour of the reference group (boys) which exhibited a higher overall ability. All 200 items included in the last 15 annual national tests in physics were analysed for DIF and classified by ten criteria The most significant result was that items in electricity were, to a significantly higher degree, flagged as DIF in favour of girls whilst items in mechanics were flagged in favour of boys. Items in other content areas showed no significant pattern. Multiple-Choice items were flagged in favour of boys. Regardless of the degree of significance by which items from different content areas were flagged on a group level it was not possible to predict which single item would be flagged for DIF. The most probable prediction was always that an item was neutral. Some possible interpretations of DIF as an effect of multidimen-sionality were discussed as were some hypotheses about the reasons why boys did better in mechanics and girls in electricity.
digitalisering@umu

APA, Harvard, Vancouver, ISO, and other styles

13

Lopez, Gabriel E. "Detection and Classification of DIF Types Using Parametric and Nonparametric Methods: A comparison of the IRT-Likelihood Ratio Test, Crossing-SIBTEST, and Logistic Regression Procedures." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4131.

Full text

Abstract:

The purpose of this investigation was to compare the efficacy of three methods for detecting differential item functioning (DIF). The performance of the crossing simultaneous item bias test (CSIBTEST), the item response theory likelihood ratio test (IRT-LR), and logistic regression (LOGREG) was examined across a range of experimental conditions including different test lengths, sample sizes, DIF and differential test functioning (DTF) magnitudes, and mean differences in the underlying trait distributions of comparison groups, herein referred to as the reference and focal groups. In addition, each procedure was implemented using both an all-other anchor approach, in which the IRT-LR baseline model, CSIBEST matching subtest, and LOGREG trait estimate were based on all test items except for the one under study, and a constant anchor approach, in which the baseline model, matching subtest, and trait estimate were based on a predefined subset of DIF-free items. Response data for the reference and focal groups were generated using known item parameters based on the three-parameter logistic item response theory model (3-PLM). Various types of DIF were simulated by shifting the generating item parameters of select items to achieve desired DIF and DTF magnitudes based on the area between the groups' item response functions. Power, Type I error, and Type III error rates were computed for each experimental condition based on 100 replications and effects analyzed via ANOVA. Results indicated that the procedures varied in efficacy, with LOGREG when implemented using an all-other approach providing the best balance of power and Type I error rate. However, none of the procedures were effective at identifying the type of DIF that was simulated.

APA, Harvard, Vancouver, ISO, and other styles

14

Awuor, Risper Akelo. "Effect of Unequal Sample Sizes on the Power of DIF Detection: An IRT-Based Monte Carlo Study with SIBTEST and Mantel-Haenszel Procedures." Diss., Virginia Tech, 2008. http://hdl.handle.net/10919/28321.

Full text

Abstract:

This simulation study focused on determining the effect of unequal sample sizes on statistical power of SIBTEST and Mantel-Haenszel procedures for detection of DIF of moderate and large magnitudes. Item parameters were estimated by, and generated with the 2PLM using WinGen2 (Han, 2006). MULTISIM was used to simulate ability estimates and to generate response data that were analyzed by SIBTEST. The SIBTEST procedure with regression correction was used to calculate the DIF statistics, namely the DIF effect size and the statistical significance of the bias. The older SIBTEST was used to calculate the DIF statistics for the M-H procedure. SAS provided the environment in which the ability parameters were simulated; response data generated and DIF analyses conducted. Test items were observed to determine if a priori manipulated items demonstrated DIF. The study results indicated that with unequal samples in any ratio, M-H had better Type I error rate control than SIBTEST. The results also indicated that not only the ratios, but also the sample size and the magnitude of DIF influenced the behavior of SIBTEST and M-H with regard to their error rate behavior. With small samples and moderate DIF magnitude, Type II errors were committed by both M-H and SIBTEST when the reference to focal group sample size ratio was 1:.10 due to low observed statistical power and inflated Type I error rates.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

15

Chun, Seokjoon. "Using MIMIC Methods to Detect and Identify Sources of DIF among Multiple Groups." Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5352.

Full text

Abstract:

This study investigated the efficacy of multiple indicators, multiple causes (MIMIC) methods in detecting uniform and nonuniform differential item functioning (DIF) among multiple groups, where the underlying causes of DIF was different. Three different implementations of MIMIC DIF detection were studied: sequential free baseline, free baseline, and constrained baseline. In addition, the robustness of the MIMIC methods against the violation of its assumption, equal factor variance across comparison groups, was investigated. We found that the sequential-free baseline methods provided similar Type I error and power rates to the free baseline method with a designated anchor, and much better Type I error and power rates than the constrained baseline method across four groups, resulting from the co-occurrence background variables. But, when the equal factor variance assumption was violated, the MIMIC methods yielded the inflated Type I error. Also, the MIMIC procedure had problems correctly identifying the sources DIF, so further methodological developments are needed.

APA, Harvard, Vancouver, ISO, and other styles

16

Anjorin, Idayatou. "HIGH-STAKES TESTS FOR STUDENTS WITH SPECIFIC LEARNING DISABILITIES: DISABILITY-BASED DIFFERENTIAL ITEM FUNCTIONING." Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1967913321&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (Ph. D.)--Southern Illinois University Carbondale, 2009.
"Department of Educational Psychology and Special Education." Includes bibliographical references (p. 110-126). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Yanju. "Item Discrimination and Type I Error Rates in DIF Detection Using the Mantel-Haenszel and Logistic Regression Procedures." Ohio University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1339428784.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Wright, Keith D. "Improvements for Differential Functioning of Items and Tests (DFIT): Investigating the Addition of Reporting an Effect Size Measure and Power." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/eps_diss/80.

Full text

Abstract:

Standardized testing has been part of the American educational system for decades. Controversy from the beginning has plagued standardized testing, is plaguing testing today, and will continue to be controversial. Given the current federal educational policies supporting increased standardized testing, psychometricians, educators and policy makers must seek ways to ensure that tests are not biased towards one group over another. In measurement theory, if a test item behaves differently for two different groups of examinees, this test item is considered a differential functioning test item (DIF). Differential item functioning, often conceptualized in the context of item response theory (IRT) is a term used to describe test items that may favor one group over another after matched on ability. It is important to determine whether an item is functioning significantly different for one group over another regardless as to why. Hypothesis testing is used to determine statistical significant DIF items; an effect size measure quantifies a statistical significant difference. This study investigated the addition of reporting an effect size measure for differential item functioning of items and tests’ (DFIT) noncompensatory differential item functioning (NCDIF), and reporting empirically observed power. The Mantel-Haenszel (MH) parameter served as the benchmark for developing NCDIF’s effect size measure, for reporting moderate and large differential item functioning in test items. In addition, by modifying NCDIF’s unique method for determining statistical significance, NCDIF will be the first DIF statistic of test items where in addition to reporting an effect size measure, empirical power can also be reported. Furthermore, this study added substantially to the body of literature on effect size by also investigating the behavior of two other DIF measures, Simultaneous Item Bias Test (SIBTEST) and area measure. Finally, this study makes a significant contribution to the body of literature by verifying in a large-scale simulation study, the accuracy of software developed by Roussos, Schnipke, and Pashley (1999) to calculate the true MH parameter. The accuracy of this software had not been previously verified.

APA, Harvard, Vancouver, ISO, and other styles

19

AGUIAR, GLAUCO DA SILVA. "A COMPARATIVE STUDY AMONG BRAZIL AND PORTUGAL ABOUT THE DIFFERENCES IN THE CURRICULAR EMPHASES IN MATHEMATICS USING THE ANALYSIS OF THE DIFFERENTIAL ITEM FUNCTIONING (DIF) FROM PISA 2003." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=12869@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
Este estudo compara as diferenças nas ênfases curriculares em Matemática no Brasil e Portugal a partir dos resultados do Programa Internacional de Avaliação dos Estudantes (PISA) no ano de 2003. Deste programa participam jovens de 15 anos de idade dos países membros da Organização para a Cooperação e o Desenvolvimento Econômico (OCDE) e também de países convidados em uma perspectiva de avaliar habilidades e conhecimentos requeridos para uma atuação efetiva na sociedade. Com base na literatura sobre currículo a ensinar, ensinado e aprendido, o estudo parte do pressuposto que os resultados de diversos países em avaliações internacionais constituem-se uma estratégia para a análise do currículo aprendido e das ênfases pedagógicas na área da Matemática. O trabalho utiliza como metodologia a análise do Funcionamento Diferencial do Item (DIF) para identificar as diferenças curriculares, como também de abordagens pedagógicas e socioculturais. Um item apresenta funcionamento diferencial quando, alunos de diferentes países que possuem a mesma habilidade cognitiva, não têm a mesma probabilidade de acertarem o item. Os resultados mostram que alguns itens de Matemática apresentam funcionamento diferencial entre alunos brasileiros e portugueses. Os aspectos que explicam este funcionamento diferencial estão relacionados com ênfases diferenciadas não apenas em determinados conteúdos da Matemática, mas também de processos cognitivos e do formato do item.
This study compares the differences in the curricular emphases in mathematics in Brazil and Portugal using the results from the Programme for International Student Assessment (PISA) in 2003. The participants of this programme are 15-year-old students from the member countries of the Organisation for Economic Co-operation and Development (OECD) and from partner countries. Its aim is to assess how these students master the essential skills and knowledge to meet real- life challenges. Based on the existing literature about the official, taught and learned curricula, this study assumes that the results of several countries in international surveys constitute a strategy for analysing the learned curriculum and the pedagogical emphases in the mathematical area. The methodology used in this work to identify the curricular differences as well as the pedagogical and sociocultural approaches is the analysis of the Differential Item Functioning (DIF). One item presents a differential functioning when students from different countries, who have the same cognitive ability, do not have the same probability of answering the item correctly. The results show that some mathematics items present differential functioning between Brazilian and Portuguese students. The aspects that explain this differential functioning are related to differential emphases not only on certain mathematics contents but also on the cognitive processes and on the item format.

APA, Harvard, Vancouver, ISO, and other styles

20

Burkes, LaShona L. "Identifying differential item functioning related to student socioeconomic status and investigating sources related to classroom opportunities to learn." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 152 p, 2009. http://proquest.umi.com/pqdweb?did=1818417291&sid=5&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Maia, JosÃ Leudo. "Uso da Teoria ClÃssica dos Testes â TCT e da Teoria de Resposta ao Item â TRI na avaliaÃÃo da qualidade mÃtrica de testes de seleÃÃo." Universidade Federal do CearÃ, 2009. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=4951.

Full text

Abstract:

FundaÃÃo de Amparo Ã Pesquisa do Estado do CearÃ
Esse trabalho doutoral tem como proposta fazer uso da Teoria ClÃssica dos Testes â TCT e da Teoria de Resposta ao Item â TRI como instrumentos na avaliaÃÃo da qualidade mÃtrica de testes de seleÃÃo, sob quatro aspectos de investigaÃÃo: AnÃlise da Validez do Construto; AnÃlise PsicomÃtrica dos Itens; Funcionamento Diferencial dos Itens â DIF; e FunÃÃo de InformaÃÃo. Para tanto, foram utilizados dados dos resultados das provas de PortuguÃs e MatemÃtica do concurso vestibular da Universidade Estadual do Estado do CearÃ â UECE, de 2007, em que participaram 20.016 candidatos a 38 Cursos de GraduaÃÃo, somente na Capital do Estado. Para o tratamento desses dados, foram utilizados os seguintes softwares: SPSS, v15; BILOG-MG, v3.0; MULTILOG FOR WINDOWS, v1.0; e o TESTFACT v4.0. A primeira providÃncia foi verificar a dimensionaidade dessas provas. Para tanto se utilizou o MÃtodo de Kaiser-Guttman, Scree-plot, e o MÃtodo das Cargas Fatoriais e das Comunalidades da Matriz de Fatores. A constataÃÃo foi de que a prova de PortuguÃs apresentava caracterÃsticas multidimensionais, sendo, portanto, descartada, por nÃo atender aos pressupostos bÃsicos da Unidimensionalidade e IndependÃncia Local dos Itens. A prova de MatemÃtica, no entanto, por apresentar comportamento unidimensional, se tornou o foco deste trabalho. A anÃlise da Validez do Construto foi realizada por meio dos coeficientes Alpha de Cronbach e Kuder-Richardson, tendo gerado valores iguais a 0,685, alÃm da utilizaÃÃo, tambÃm, do mÃtodo das Cargas Fatoriais, com cargas entre 0,837 e 0,960, indicando intensa consistÃncia interna. A anÃlise psicomÃtrica dos itens foi realizada por meio dos Ãndices de dificuldade, discriminaÃÃo e acerto ao acaso, para ambas as teorias, indicando ser essa uma prova de dificuldade mediana, com bom comportamento discriminativo e baixo Ãndice de acerto ao acaso. A anÃlise do DIF foi realizada, segundo o gÃnero dos candidatos, pelos mÃtodos Delta-plot, Maentel-Haenszel, RegressÃo LogÃstica e ComparaÃÃo dos Betas, indicando resultados estatÃsticamente nÃo significativos, no que se concluiu nÃo apresentar, a prova, comportamento diferenciado, segundo o gÃnero. A anÃlise da FunÃÃo de InformaÃÃo da prova permitiu se observar que esta Ã particularmente vÃlida para candidatos com aptidÃo em torno de 0,8750 e que, a um nÃvel de confianÃa de 95%, 49,3% dos candidatos atenderiam a essa indicaÃÃo. Observou-se tambÃm que 90,6% dos candidatos, em ambos os processos, apresentaram o mesmo nÃvel de aptidÃo, indicando uma convergÃncia bastante razoÃvel entre os resultados gerados pela TCT e TRI, no entanto, no estudo amostral, a TRI identificou que 9,4% dos candidatos apresentaram maior aptidÃo para a realizaÃÃo de um curso superior que os selecionados pela TCT.

APA, Harvard, Vancouver, ISO, and other styles

22

MAIA, José Leudo. "Uso da Teoria Clássica dos Testes – TCT e da Teoria de Resposta ao Item – TRI na avaliação da qualidade métrica de testes de seleção." http://www.teses.ufc.br, 2009. http://www.repositorio.ufc.br/handle/riufc/3235.

Full text

Abstract:

MAIA, José Leudo. Uso da Teoria Clássica dos Testes – TCT e da Teoria de Resposta ao Item – TRI na avaliação da qualidade métrica de testes de seleção. 2009. 325f. Tese (Doutorado em Educação) – Universidade Federal do Ceará, Faculdade de Educação, Programa de Pós-Graduação em Educação Brasileira, Fortaleza-CE, 2009.
Submitted by Maria Josineide Góis (josineide@ufc.br) on 2012-07-10T11:42:58Z No. of bitstreams: 1 2009_Tese_JLMaia.pdf: 4582126 bytes, checksum: 35b2f8279baa21b052a910889b5a7001 (MD5)
Approved for entry into archive by Maria Josineide Góis(josineide@ufc.br) on 2012-07-13T11:49:37Z (GMT) No. of bitstreams: 1 2009_Tese_JLMaia.pdf: 4582126 bytes, checksum: 35b2f8279baa21b052a910889b5a7001 (MD5)
Made available in DSpace on 2012-07-13T11:49:37Z (GMT). No. of bitstreams: 1 2009_Tese_JLMaia.pdf: 4582126 bytes, checksum: 35b2f8279baa21b052a910889b5a7001 (MD5) Previous issue date: 2009
sse trabalho doutoral tem como proposta fazer uso da Teoria Clássica dos Testes – TCT e da Teoria de Resposta ao Item – TRI como instrumentos na avaliação da qualidade métrica de testes de seleção, sob quatro aspectos de investigação: Análise da Validez do Construto; Análise Psicométrica dos Itens; Funcionamento Diferencial dos Itens – DIF; e Função de Informação. Para tanto, foram utilizados dados dos resultados das provas de Português e Matemática do concurso vestibular da Universidade Estadual do Estado do Ceará – UECE, de 2007, em que participaram 20.016 candidatos a 38 Cursos de Graduação, somente na Capital do Estado. Para o tratamento desses dados, foram utilizados os seguintes softwares: SPSS, v15; BILOG-MG, v3.0; MULTILOG FOR WINDOWS, v1.0; e o TESTFACT v4.0. A primeira providência foi verificar a dimensionaidade dessas provas. Para tanto se utilizou o Método de Kaiser-Guttman, Scree-plot, e o Método das Cargas Fatoriais e das Comunalidades da Matriz de Fatores. A constatação foi de que a prova de Português apresentava características multidimensionais, sendo, portanto, descartada, por não atender aos pressupostos básicos da Unidimensionalidade e Independência Local dos Itens. A prova de Matemática, no entanto, por apresentar comportamento unidimensional, se tornou o foco deste trabalho. A análise da Validez do Construto foi realizada por meio dos coeficientes Alpha de Cronbach e Kuder-Richardson, tendo gerado valores iguais a 0,685, além da utilização, também, do método das Cargas Fatoriais, com cargas entre 0,837 e 0,960, indicando intensa consistência interna. A análise psicométrica dos itens foi realizada por meio dos índices de dificuldade, discriminação e acerto ao acaso, para ambas as teorias, indicando ser essa uma prova de dificuldade mediana, com bom comportamento discriminativo e baixo índice de acerto ao acaso. A análise do DIF foi realizada, segundo o gênero dos candidatos, pelos métodos Delta-plot, Maentel-Haenszel, Regressão Logística e Comparação dos Betas, indicando resultados estatísticamente não significativos, no que se concluiu não apresentar, a prova, comportamento diferenciado, segundo o gênero. A análise da Função de Informação da prova permitiu se observar que esta é particularmente válida para candidatos com aptidão em torno de 0,8750 e que, a um nível de confiança de 95%, 49,3% dos candidatos atenderiam a essa indicação. Observou-se também que 90,6% dos candidatos, em ambos os processos, apresentaram o mesmo nível de aptidão, indicando uma convergência bastante razoável entre os resultados gerados pela TCT e TRI, no entanto, no estudo amostral, a TRI identificou que 9,4% dos candidatos apresentaram maior aptidão para a realização de um curso superior que os selecionados pela TCT.

APA, Harvard, Vancouver, ISO, and other styles

23

Roomaney, Rizwana. "Towards establishing the equivalence of the IsiXhosa and English versions of the Woodcok Munoz language survey : an item and construct bias analysis of the verbal analogies scale." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_7549_1306830207.

Full text

Abstract:

This study formed part of a larger project that is concerned with the adaptation of a test of cognitive academic language proficiency, the Woodcock Muñ
oz Language Survey (WMLS). The WMLS has been adapted from English into isiXhosa and the present study is located within the broader study that is concerned with establishing overall equivalence between the two language versions of the WMLS. It was primarily concerned with the Verbal Analogies (VA) scale. Previous research on this scale has demonstrated promising results, but continues to find evidence of some inequivalence. This study aimed to cross-validate previous research on the two language versions of the WMLS and improve on methodological issues by employing matched groups. It drew upon an existing dataset from the larger research project. The study employed a monolingual matched two-group design consisting of 150 mainly English speaking and 149 mainly isiXhosa learners in grades 6 and 7. This study had two sub aims. The first was to investigate item bias by identifying DIF items in the VA scale across the isiXhosa and English by conducting a logistic regression and Mantel-Haenszel procedure. Five items were identified by both techniques as DIF. The second sub aim was to evaluate construct equivalence between the isiXhosa and English versions of the WMLS on the VA scale by conducting a factor analysis on the tests after removal of DIF items. Two factors were requested during the factor analysis. The first factor displayed significant loadings across both language versions and was identified as a stable factor. This was confirmed by the Tucker&rsquo
s Phi and scatter plot. The second factor was stable for the English version but not for the isiXhosa version. The Tucker&rsquo
s phi and scatter plot indicated that this factor is not structurally equivalent across the two language versions

APA, Harvard, Vancouver, ISO, and other styles

24

Price, Emily A. "Item Discrimination, Model-Data Fit, and Type I Error Rates in DIF Detection using Lord's χ², the Likelihood Ratio Test, and the Mantel-Haenszel Procedure." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1395842816.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Haag, Nicole. "Differenzielle Validität von Mathematiktestaufgaben für Kinder mit nicht-deutscher Familiensprache." Doctoral thesis, Humboldt-Universität zu Berlin, Lebenswissenschaftliche Fakultät, 2015. http://dx.doi.org/10.18452/17398.

Full text

Abstract:

Verschiedene Schulleistungsstudien stellten für Kinder mit nicht-deutscher Familiensprache bereits in der Grundschule substanzielle Disparitäten im Bereich Mathematik fest. Diese Disparitäten führten zu der Frage, ob die verwendeten Testverfahren zu hohe sprachliche Hürden für Kinder mit nicht-deutscher Familiensprache aufweisen und daher nicht ausreichend in der Lage sind, die Kompetenzen dieser Gruppe valide zu erfassen. In dieser kumulativen Arbeit wurde geprüft, inwiefern die sprachliche Komplexität von Mathematikaufgaben in der Grundschule einen benachteiligenden Einfluss auf die Erfassung der Mathematikleistung von Kindern mit nicht-deutscher Familiensprache darstellt. Zunächst wurde geprüft, ob die in nationalen Schulleistungsstudien verwendeten Aufgaben für diese Gruppe differenziell valide sind. Daran anschließend wurde untersucht, ob sich itemspezifische Kompetenznachteile durch die sprachlichen Merkmale der Aufgaben erklären lassen. In der vorliegenden Arbeit konnte gezeigt werden, dass die differenzielle Validität der betrachteten Testverfahren für Kinder mit nicht-deutscher Familiensprache insgesamt gering ausgeprägt ist. Ferner wurde festgestellt, dass sich die einzelnen sprachlichen Merkmale der Aufgaben sowohl spezifisch als auch gemeinsam auf die differenzielle Validität auswirken. Der größte Anteil der itemspezifischen Kompetenznachteile wurde durch mehrere Merkmale gemeinsam aufgeklärt. Eine experimentelle Teilstudie zeigte, dass eine sprachliche Vereinfachung nicht geeignet scheint, um die Kompetenznachteile von Kindern mit nicht-deutscher Familiensprache substanziell zu verringern. Ein Vergleich der Effekte sprachlicher Merkmale von Mathematikaufgaben auf die Mathematikleistungen von Kindern mit nicht-deutscher Familiensprache zwischen der dritten und der vierten Klassenstufe ergab, dass sich die sprachliche Komplexität der Aufgaben vor allem für jüngere Grundschulkinder unabhängig von ihrer Familiensprache benachteiligend auswirkte.
Large-scale assessment studies have repeatedly documented performance disadvantages of language minority students in German elementary schools. The substantial achievement gap has led to concerns regarding the validity of large-scale assessment items for language minority students. It may be the case that these performance differences are, in part, due to high language demands of the test items. These items may selectively disadvantage language minority students in the testing situation. This dissertation project investigated the connection between the academic language demands of mathematics test items and the test performance of monolingual students and language minority students. First, it was investigated whether the test items were differentially valid for language minority students. Moreover, the connection between the differential validity and the linguistic complexity of the test items was tested. The findings indicated that overall, differential validity of the examined tests for language minority students was low. However, the test items’ language demands were related to differential validity. The largest proportion of item-specific performance disadvantages was explained by confounded combinations of several linguistic features. Additionally, unique effects of descriptive, lexical, and grammatical features were identified. An experimental study showed that linguistic simplification did not seem to be a promising method to substantially reduce the performance differences between language minority students and German monolingual students. A comparison of differential effects of mathematics items’ language demands for language minority students over two adjacent grade levels indicated that the impact of academic language demands seemed to depend on grade level rather than on language minority student status. Regardless of their home language, younger students seemed to struggle more with linguistically complex test items than older students.

APA, Harvard, Vancouver, ISO, and other styles

26

Sanguras, Laila Y. "Construct Validation and Measurement Invariance of the Athletic Coping Skills Inventory for Educational Settings." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984216/.

Full text

Abstract:

The present study examined the factor structure and measurement invariance of the revised version of the Athletic Coping Skills Inventory (ACSI-28), following adjustment of the wording of items such that they were appropriate to assess Coping Skills in an educational setting. A sample of middle school students (n = 1,037) completed the revised inventory. An initial confirmatory factor analysis led to the hypothesis of a better fitting model with two items removed. Reliability of the subscales and the instrument as a whole was acceptable. Items were examined for sex invariance with differential item functioning (DIF) using item response theory, and five items were flagged for significant sex non-invariance. Following removal of these items, comparison of the mean differences between male and female coping scores revealed that there was no significant difference between the two groups. Further examination of the generalizability of the coping construct and the potential transfer of psychosocial skills between athletic and academic settings are warranted.

APA, Harvard, Vancouver, ISO, and other styles

27

O'Brien, Erin L. "USING DIFFERENTIAL FUNCTIONING OF ITEMS AND TESTS (DFIT) TO EXAMINE TARGETED DIFFERENTIAL ITEM FUNCTIONING." Wright State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=wright1421955213.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Jiang, Jing. "Regularization Methods for Detecting Differential Item Functioning:." Thesis, Boston College, 2019. http://hdl.handle.net/2345/bc-ir:108404.

Full text

Abstract:

Thesis advisor: Zhushan Mandy Li
Differential item functioning (DIF) occurs when examinees of equal ability from different groups have different probabilities of correctly responding to certain items. DIF analysis aims to identify potentially biased items to ensure the fairness and equity of instruments, and has become a routine procedure in developing and improving assessments. This study proposed a DIF detection method using regularization techniques, which allows for simultaneous investigation of all items on a test for both uniform and nonuniform DIF. In order to evaluate the performance of the proposed DIF detection models and understand the factors that influence the performance, comprehensive simulation studies and empirical data analyses were conducted. Under various conditions including test length, sample size, sample size ratio, percentage of DIF items, DIF type, and DIF magnitude, the operating characteristics of three kinds of regularized logistic regression models: lasso, elastic net, and adaptive lasso, each characterized by their penalty functions, were examined and compared. Selection of optimal tuning parameter was investigated using two well-known information criteria AIC and BIC, and cross-validation. The results revealed that BIC outperformed other model selection criteria, which not only flagged high-impact DIF items precisely, but also prevented over-identification of DIF items with few false alarms. Among the regularization models, the adaptive lasso model achieved superior performance than the other two models in most conditions. The performance of the regularized DIF detection model using adaptive lasso was then compared to two commonly used DIF detection approaches including the logistic regression method and the likelihood ratio test. The proposed model was applied to analyzing empirical datasets to demonstrate the applicability of the method in real settings
Thesis (PhD) — Boston College, 2019
Submitted to: Boston College. Lynch School of Education
Discipline: Educational Research, Measurement and Evaluation

APA, Harvard, Vancouver, ISO, and other styles

29

McBride, Nadine LeBarron. "Differential Item Functioning on the International Personality Item Pool's Neuroticism Scale." Diss., Virginia Tech, 2008. http://hdl.handle.net/10919/29999.

Full text

Abstract:

As use of the public-domain International Personality Item Pool (IPIP) scales has grown significantly over the past decade (Goldberg, Johnson, Eber, Hogan, Ashton, Cloninger, & Gough, 2006) research on the psychometric properties of the items and scales have become increasingly important. This research study examines the IPIP scale constructed to measure the Five Factor Model (FFM) domain of Neuroticism (as measured by the NEO-PI-R) for occurrences of differential functioning at both the item and test level by gender and three age ranges using the DFIT framework (Raju, van der Linden, & Fleer, 1993) This study found six items that displayed differential item functioning by gender and three items that displayed differential item functioning by age. No differential functioning at the test level was found. Items demonstrating DIF and implications for potential scale revision are discussed.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

30

Li, Zhen. "Impact of differential item functioning on statistical conclusions." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/14680.

Full text

Abstract:

Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. This dissertation investigated the impact of DIF on later statistical decisions based on the observed total test (or scale) score. Very little is known in the literature about the impact of DIF on the Type I error rate and effect size of, for instance, the independent samples t-test on the observed total test scores. Five studies were conducted: studies one to three investigated the impact of unidirectional DIF (i.e., DIF amplification) on the Type I error rate and effect size of the independent samples t-test; studies four and five investigated the DIF cancellation effects on the Type I error rate and effect size of the independent samples t-test. The Type I error rate and effect size were defined in terms of latent population means rather than observed sample means. The results showed that the amplification and cancellation effects among uniform DIF items did transfer to the test level. Both the Type I error rate and effect size were inflated. The degree of inflation depends on the number of DIF items, magnitude of DIF, sample sizes, and interactions among these factors. These findings highlight the importance of screening DIF before conducting any further statistical analysis. It offers advice to practicing researchers about when and how much the presence of DIF will affect their statistical conclusions based on the total observed test scores.

APA, Harvard, Vancouver, ISO, and other styles

31

Samuelsen, Karen Marie. "Examining differential item functioning from a latent class perspective." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/2682.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Measurement, Statistics and Evaluation. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

32

Bryant, Damon. "THE EFFECTS OF DIFFERENTIAL ITEM FUNCTIONING ON PREDICTIVE BIAS." Doctoral diss., University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2237.

Full text

Abstract:

The purpose of this research was to investigate the relation between measurement bias at the item level (differential item functioning, dif) and predictive bias at the test score level. Dif was defined as a difference in the probability of getting a test item correct for examinees with the same ability but from different subgroups. Predictive bias was defined as a difference in subgroup regression intercepts and/or slopes in predicting a criterion. Data were simulated by computer. Two hypothetical subgroups (a reference group and a focal group) were used. The predictor was a composite score on a dimensionally complex test with 60 items. Sample size (35, 70, and 105 per group), validity coefficient (.3 or .5), and the mean difference on the predictor (0, .33, .66, and 1 standard deviation, sd) and the criterion (0 and .35 sd) were manipulated. The percentage of items showing dif (0%, 15%, and 30%) and the effect size of dif (small = .3, medium = .6, and large = .9) were also manipulated. Each of the 432 conditions in the 3 x 2 x 4 x 2 x 3 x 3 design was replicated 500 times. For each replication, a predictive bias analysis was conducted, and the detection of predictive bias against each subgroup was the dependent variable. The percentage of dif and the effect size of dif were hypothesized to influence the detection of predictive bias; hypotheses were also advanced about the influence of sample size and mean subgroup differences on the predictor and criterion. Results indicated that dif was not related to the probability of detecting predictive bias against any subgroup. Results were inconsistent with the notion that measurement bias and predictive bias are mutually supportive, i.e., the presence (or absence) of one type of bias is evidence in support of the presence (or absence) of the other type of bias. Sample size and mean differences on the predictor/criterion had direct and indirect effects on the probability of detecting predictive bias against both reference and focal groups. Implications for future research are discussed.
Ph.D.
Department of Psychology
Arts and Sciences
Psychology

APA, Harvard, Vancouver, ISO, and other styles

33

Greenberg, Stuart Elliot. "Differential item functioning on the Myers-Briggs type indicator." Diss., Virginia Tech, 1992. http://hdl.handle.net/10919/38455.

Full text

Abstract:

Differential item functioning on the Myers-Briggs Type Indicator (MBTI) was examined in regard to gender. The Myers-Briggs has a differential scoring system for males and females on its thinking/feeling subscale. This scoring system preserves the 60 % thinking male and 30 % thinking female proportion that is implied by the Jungian theory underlying the Indicator. The MBTI's authors contended that the sex-based differential scoring system corrects items that subjects at a certain level of a latent trait either incorrectly endorse or leave blank. This reasoning is the classical definition of differential item functioning (DIF); consequently, the non differentially scored items should exhibit DIF. If these items do not show DIF, then there would be no reason to use a differential scoring system. Although the Indicator has been in use for several decades, no rigorous item response theory (IRT) item-level analysis of the Indicator has been undertaken. IRT analysis allows for mean differences in subgroups to occur, independent of the question of DlF. Linn and Harnisch's (1981) pseudo-lRT analysis was chosen to test for the presence of DlF in the MBTl items because it is best for tests of relatively small length. The Myers-Briggs subscales range from 22 to 26 items, which is relatively small by lRT standards. lRT analyses conducted on N=1887 subjects indicated that no items on the thinking/feeling subscale showed evidence of DIF. Out of 94 items, only one extraversion/introversion item and one judging/perception item showed evidence of DIF; no Thinking/Feeling items showed DIF. It is recommended that sex-based differential MBTI scoring be abandoned, and that the distribution of type in the population be examined in future studies.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

34

Henderson, Dianne L. "Investigation of differential item functioning in exit examinations across item format and subject area." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape8/PQDD_0019/NQ46848.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Chen, Dong Qi Kayla. "Gender-related differential item functioning analysis on the GEPT-kids." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3953512.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Gibson, Shanan Gwaltney IV. "Differential Item Functioning on the Armed Services Vocational Aptitude Battery." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/37047.

Full text

Abstract:

Utilizing Item Response Theory (IRT) methodologies, the Armed Services Vocational Aptitude Battery (ASVAB) was examined for differential item functioning (DIF) on the basis of crossed gender and ethnicity variables. Both the Mantel-Haenszel procedure and an IRT area-based technique were utilized to assess the degree of uniform and non-uniform DIF in a sample of ASVAB takers. The analysis was performed such that each subgroup of interest functioned as the focal group to be compared to the male reference group. This type of DIF analysis allowed for comparisons within ethnic group, within gender group, as well as crossed ethnic/gender group. The groups analyzed were: White, Black, and Hispanic males, and White and Black females. It was hypothesized that DIF would be found, at the scale level, on several of the ASVAB sub-tests as a result of unintended latent trait demands of items. In particular, those tests comprised of items requiring specialized jargon, visuospatial ability, or advanced English vocabulary are anticipated to show bias toward white males and/or white females. Findings were mixed. At the item level, DIF fluctuated greatly. Numerous instances of DIF favoring the reference as well as the focal group were found. At the scale level, inconsistencies existed across the forms and versions. Tests varied in their tendency to be biased against the focal group of interest and at times, performed contrary to expectations.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

37

Gratias, Melissa B. "Gender and Ethnicity-Based Differential Item Functioning on the Myers-Briggs Type Indicator." Thesis, Virginia Tech, 1997. http://hdl.handle.net/10919/30362.

Full text

Abstract:

Item Response Theory (IRT) methodologies were employed in order to examine the Myers-Briggs Type Indicator (MBTI) for differential item functioning (DIF) on the basis of crossed gender and ethnicity variables. White males were the reference group, and the focal groups were: black females, black males, and white females. The MBTI was predicted to show DIF in all comparisons. In particular, DIF on the Thinking-Feeling scale was hypothesized especially in the comparisons between white males and black females and between white males and white females. A sample of 10,775 managers who took the MBTI at assessment centers provided the data for the present experiment. The Mantel-Haenszel procedure and an IRT-based area technique were the methods of DIF-detection. Results showed several biased items on all scales for all comparisons. Ethnicitybased bias was seen in the white male vs. black female and white male vs. black male comparisons. Gender-based bias was seen particularly in the white male vs. white female comparisons. Consequently, the Thinking-Feeling showed the least DIF of all scales across comparisons, and only one of the items differentially scored by gender was found to be biased. Findings indicate that the gender-based differential scoring system is not defensible in managerial samples, and there is a need for further research into the study of differential item functioning with regards to ethnicity.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

38

Juve, John A. "Assessing differential item functioning and item parameter drift in the college basic academic subjects examination /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p3137717.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Li, Yong "Isaac." "Extending the Model with Internal Restrictions on Item Difficulty (MIRID) to Study Differential Item Functioning." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6724.

Full text

Abstract:

Differential item functioning (DIF) is a psychometric issue routinely considered in educational and psychological assessment. However, it has not been studied in the context of a recently developed componential statistical model, the model with internal restrictions on item difficulty (MIRID; Butter, De Boeck, & Verhelst, 1998). Because the MIRID requires test questions measuring either single or multiple cognitive processes, it creates a complex environment for which traditional DIF methods may be inappropriate. This dissertation sought to extend the MIRID framework to detect DIF at the item-group level and the individual-item level. Such a model-based approach can increase the interpretability of DIF statistics by focusing on item characteristics as potential sources of DIF. In particular, group-level DIF may reveal comparative group strengths in certain secondary constructs. A simulation study was conducted to examine under different conditions parameter recovery, Type I error rates, and power of the proposed approach. Factors manipulated included sample size, magnitude of DIF, distributional characteristics of the groups, and the MIRID DIF models corresponding to discrete sources of differential functioning. The impact of studying DIF using wrong models was investigated. The results from the recovery study of the MIRID DIF model indicate that the four delta (i.e., non-zero value DIF) parameters were underestimated whereas item locations of the four associated items were overestimated. Bias and RMSE were significantly greater when delta was larger; larger sample size reduced RMSE substantially while the effects from the impact factor were neither strong nor consistent. Hypothesiswise and adjusted experimentwise Type I error rates were controlled in smaller delta conditions but not in larger delta conditions as estimates of zero-value DIF parameters were significantly different from zero. Detection power of the DIF model was weak. Estimates of the delta parameters of the three group-level DIF models, the MIRID differential functioning in components (DFFc), the MIRID differential functioning in item families (DFFm), and the MIRID differential functioning in component weights (DFW), were acceptable in general. They had good hypothesiswise and adjusted experimentwise Type I error control across all conditions and overall achieved excellent detection power. When fitting the proposed models to mismatched data, the false detection rates were mostly beyond the Bradley criterion because the zero-value DIF parameters in the mismatched model were not estimated adequately, especially in larger delta conditions. Recovery of item locations and component weights was also not adequate in larger delta conditions. Estimation of these parameters was more or less affected adversely by the DIF effect simulated in the mismatched data. To study DIF in MIRID data using the model-based approach, therefore, more research is necessary to determine the appropriate procedure or model to implement, especially for item-level differential functioning.

APA, Harvard, Vancouver, ISO, and other styles

40

Pagano, Ian S. "Ethnic differential item functioning in the assessment of quality of life." Thesis, University of Hawaii at Manoa, 2003. http://hdl.handle.net/10125/3067.

Full text

Abstract:

Ethnic differential item functioning (DIF) on the QLQ-C30 quality of life (QoL) questionnaire for cancer patients was investigated using item response theory methods. The sample consisted of 359 cancer patients representing four ethnic groups: Caucasian, Filipino, Hawaiian, and Japanese. Results showed the presence of DIF on several items, indicating ethnic differences in the assessment of quality of life. Relative to the Caucasian and Japanese groups, items related to financial difficulties, need for rest, nausea or vomiting, emotional difficulties, and social difficulties, exhibited DIF for Filipinos. On these items Filipinos exhibited lower QoL scores, even though overall QoL was not lower. This evidence may explain why Filipinos have previously been found to have lower overall QoL. Although Filipinos score lower on QoL than other groups, this may not reflect lower QoL, but rather differences in how QoL is defined. Additionally, DIF did not appear to alter the psychometric properties of the QLQ-C30.
Thesis (Ph. D.)--University of Hawaii at Manoa, 2003.
Includes bibliographical references (leaves 41-48).
Mode of access: World Wide Web.
Also available by subscription via World Wide Web
vi, 48 leaves, bound 29 cm

APA, Harvard, Vancouver, ISO, and other styles

41

Zhang, Mo. "Gender related differential item functioning in mathematics tests a meta-analysis /." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Thesis/Summer2009/m_zhang_072109.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Mtsatse, Nangamso. "Exploring Differential Item Functioning on reading achievement between English and isiXhosa." Diss., University of Pretoria, 2017. http://hdl.handle.net/2263/65447.

Full text

Abstract:

Post-Apartheid South Africa has undergone an educational language policy shift from only Afrikaans and English in education to the representation of all 11 official languages: Afrikaans, English, isiZulu, isiXhosa, isiNdebele, siSwati, Sesotho, Setswana, Tshivenda and Xitsonga. The national languages policy included the Language in Education Policy (LiEP), which stipulates that learners from grades 1- 3 in all ways possible should be provided the opportunity to be taught in their home language (HL). With this change, there has been a need to increase access to African languages in education. The 2007 Status of LoLT report released by the Department of Education (DoE) revealed that since 1996 up to 65% of learners in the foundation phase are being taught in their home language. In other ways, the LiEP has been successful in bridging the gap of access to African languages in the basic education system. With that said, there has been rapid growth of interest in early childhood crosscultural literacy assessment across the globe. Internationally South Africa has participated in the Southern and Eastern Africa Consortium for Monitoring Education Quality as well as the Progress in International Reading Literacy Study studies. The design of these particular international studies meant participation in the same assessment but in different languages, calling into question the equivalence of assessments across languages. Assessing across languages should aim to encourage linguistic equivalence, functioning equivalence, cultural equivalence as well as metric equivalence. South Africa has taken part in three cycles of the Progress in International Reading Literacy (PIRLS) study. The purposes of the current study is to present secondary analysis of the prePIRLS 2011 data, to investigate any differential item functioning (DIF) of the achievement scores between English and isiXhosa. The Organisation for Economic Co-operation and Development (OECD) developed a framework of input, process and output for curriculum process. The framework shows the multiple facets that needs to be considered when implementing a curriculum in a country. The curriculum process framework was used as the theoretical framework for this study. The framework views curriculum success as a process of measuring how the intended curriculum (input) was implemented (process) and should be reflected in the attained curriculum (output). The adapted framework is LiEP as the attained curriculum, as learners in the prePIRLS 2011 are tested in the LoLT in Grades 1-3. Followed by the prePIRLS 2011 assessment, as the implemented curriculum testing the learners’ comprehension skills requires by grade 4 in their HL. Lastly, the attained curriculum refers the learners’ achievement scores in the prePIRLS 2011 study. A sample of 819 Grade 4 learners (539 English L1 speaking learners and 279 isiXhosa L1 speakign learners) that participated in the prePIRLS 2011 study were included in this study. These learners wrote a literary passage called The Lonely Giraffe, accompanied by 15 items. The study made use of the Rasch model to investigate any evidence of Differential Item Functioning (DIF) on the reading achievement of the learners. The findings showed that the items did not reflect an equal distribution. In addition, an item by item DIF analysis revealed discrimination on one subgroup over the other. A further investigation showed that these discriminations could be explained by means of inaccurate linguistic equivalence. The linguistic equivalence could be explained by means of mistranslation and/or dialectal differences. Subsequently, the complexities of dialects in African languages are presented by providing isiXhosa alternative translations to the items. The significance of the current study is in its potential contribution in further understanding language complexities in large-scale assessments. Additionally, in attempts to provide valid, reliable and fair assessment data across sub-groups.
Dissertation (MEd)--University of Pretoria, 2017.
Science, Mathematics and Technology Education
Centre for Evaluation & Assessment (CEA)
MEd
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

43

Brown, Paulette C. "An empirical study of the consistency of differential item functioning detection." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/7928.

Full text

Abstract:

Total test scores of examinees on any given standardized test are used to provide reliable and objective information regarding the overall performance of the test takers. When the probability of successfully responding to a test item is not the same for examinees at the same ability levels, but from different groups, the item functions differentially in favour of one group over the other group. This type of problem, defined as differential item functioning (DIF), creates a disadvantage for members of certain subgroups of test takers. Test items need to be accurate and valid measures for all groups because test results may be used to make significant decisions which may have an impact on the future opportunities available to test takers. Thus, DIF is an issue of concern in the field of educational measurement. The purpose of this study was to investigate how well the Mantel-Haenszel (MH) and logistic regression (LR) procedures perform in the identification of items that function differentially across gender groups and regional groups. Research questions to be answered by this study were concerned with three issues: (1) the detection rates for DIF items and items which did not exhibit DIF, (2) the agreement for the MH and LR methods in the detection of DIF items, and (3) the effectiveness of these indices across sample size and over replications. (Abstract shortened by UMI.)

APA, Harvard, Vancouver, ISO, and other styles

44

Stephens-Bonty, Torie Amelia. "Using Three Different Categorical Data Analysis Techniques to Detect Differential Item Functioning." Digital Archive @ GSU, 2008. http://digitalarchive.gsu.edu/eps_diss/24.

Full text

Abstract:

Diversity in the population along with the diversity of testing usage has resulted in smaller identified groups of test takers. In addition, computer adaptive testing sometimes results in a relatively small number of items being used for a particular assessment. The need and use for statistical techniques that are able to effectively detect differential item functioning (DIF) when the population is small and or the assessment is short is necessary. Identification of empirically biased items is a crucial step in creating equitable and construct-valid assessments. Parshall and Miller (1995) compared the conventional asymptotic Mantel-Haenszel (MH) with the exact test (ET) for the detection of DIF with small sample sizes. Several studies have since compared the performance of MH to logistic regression (LR) under a variety of conditions. Both Swaminathan and Rogers (1990), and Hildalgo and López-Pina (2004) demonstrated that MH and LR were comparable in their detection of items with DIF. This study followed by comparing the performance of the MH, the ET, and LR performance when both the sample size is small and test length is short. The purpose of this Monte Carlo simulation study was to expand on the research done by Parshall and Miller (1995) by examining power and power with effect size measures for each of the three DIF detection procedures. The following variables were manipulated in this study: focal group sample size, percent of items with DIF, and magnitude of DIF. For each condition, a small reference group size of 200 was utilized as well as a short, 10-item test. The results demonstrated that in general, LR was slightly more powerful in detecting items with DIF. In most conditions, however, power was well below the acceptable rate of 80%. As the size of the focal group and the magnitude of DIF increased, the three procedures were more likely to reach acceptable power. Also, all three procedures demonstrated the highest power for the most discriminating item. Collectively, the results from this research provide information in the area of small sample size and DIF detection.

APA, Harvard, Vancouver, ISO, and other styles

45

Carter, Nathan T. "APPLICATIONS OF DIFFERENTIAL FUNCTIONING METHODS TO THE GENERALIZED GRADED UNFOLDING MODEL." Bowling Green State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1290885927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Thurman, Carol Jenetha. "A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning in Polytomous Items." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/eps_diss/48.

Full text

Abstract:

The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the groups. Determining whether the difference in performance on an item between two demographic groups is due to between group differences in ability or some form of unfairness in the item is a more complex task for a polytomous item, because of its many score categories, than for a dichotomous item. Effective DIF detection methods must be able to locate DIF within each of these various score categories. The Mantel, Generalized Mantel Haenszel (GMH), and Logistic Regression (LR) are three of several DIF detection methods that are able to test for DIF in polytomous items. There have been relatively few studies on the effectiveness of polytomous procedures to detect DIF; and of those studies, only a very small percentage have examined the efficiency of the Mantel, GMH, and LR procedures when item discrimination magnitudes and category intersection parameters vary and when there are different patterns of DIF (e.g., balanced versus constant) within score categories. This Monte Carlo simulation study compared the Type I error and power of the Mantel, GMH, and OLR (LR method for ordinal data) procedures when variation occurred in 1) the item discrimination parameters, 2) category intersection parameters, 3) DIF patterns within score categories, and 4) the average latent traits between the reference and focal groups. Results of this investigation showed that high item discrimination levels were directly related to increased DIF detection rates. The location of the difficulty parameters was also found to have a direct effect on DIF detection rates. Additionally, depending on item difficulty, DIF magnitudes and patterns within score categories were found to impact DIF detection rates and finally, DIF detection power increased as DIF magnitudes became larger. The GMH outperformed the Mantel and OLR and is recommended for use with polytomous data when the item discrimination varies across items.

APA, Harvard, Vancouver, ISO, and other styles

47

Garrett, Phyllis Lorena. "A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/eps_diss/35.

Full text

Abstract:

ABSTRACT A MONTE CARLO STUDY INVESTIGATING MISSING DATA, DIFFERENTIAL ITEM FUNCTIONING, AND EFFECT SIZE by Phyllis Garrett The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur simultaneously in real assessment situations. This study investigated the Type I error and power of several DIF detection methods and methods of handling missing data for polytomous items generated under the partial credit model. The Type I error and power of the Mantel and ordinal logistic regression were compared using within-person mean substitution and multiple imputation when data were missing completely at random. In addition to assessing the Type I error and power of DIF detection methods and methods of handling missing data, this study also assessed the impact of missing data on the effect size measure associated with the Mantel, the standardized mean difference effect size measure, and ordinal logistic regression, the R-squared effect size measure. Results indicated that the performance of the Mantel and ordinal logistic regression depended on the percent of missing data in the data set, the magnitude of DIF, and the sample size ratio. The Type I error for both DIF detection methods varied based on the missing data method used to impute the missing data. Power to detect DIF increased as DIF magnitude increased, but there was a relative decrease in power as the percent of missing data increased. Additional findings indicated that the percent of missing data, DIF magnitude, and sample size ratio also influenced the effect size measures associated with the Mantel and ordinal logistic regression. The effect size values for both DIF detection methods generally increased as DIF magnitude increased, but as the percent of missing data increased, the effect size values decreased.

APA, Harvard, Vancouver, ISO, and other styles

48

Wood, Scott William. "Differential item functioning procedures for polytomous items when examinee sample sizes are small." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/1110.

Full text

Abstract:

As part of test score validity, differential item functioning (DIF) is a quantitative characteristic used to evaluate potential item bias. In applications where a small number of examinees take a test, statistical power of DIF detection methods may be affected. Researchers have proposed modifications to DIF detection methods to account for small focal group examinee sizes for the case when items are dichotomously scored. These methods, however, have not been applied to polytomously scored items. Simulated polytomous item response strings were used to study the Type I error rates and statistical power of three popular DIF detection methods (Mantel test/Cox's β, Liu-Agresti statistic, HW3) and three modifications proposed for contingency tables (empirical Bayesian, randomization, log-linear smoothing). The simulation considered two small sample size conditions, the case with 40 reference group and 40 focal group examinees and the case with 400 reference group and 40 focal group examinees. In order to compare statistical power rates, it was necessary to calculate the Type I error rates for the DIF detection methods and their modifications. Under most simulation conditions, the unmodified, randomization-based, and log-linear smoothing-based Mantel and Liu-Agresti tests yielded Type I error rates around 5%. The HW3 statistic was found to yield higher Type I error rates than expected for the 40 reference group examinees case, rendering power calculations for these cases meaningless. Results from the simulation suggested that the unmodified Mantel and Liu-Agresti tests yielded the highest statistical power rates for the pervasive-constant and pervasive-convergent patterns of DIF, as compared to other DIF method alternatives. Power rates improved by several percentage points if log-linear smoothing methods were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. Power rates did not improve if Bayesian methods or randomization tests were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. ANOVA tests showed that statistical power was higher when 400 reference examinees were used versus 40 reference examinees, when impact was present among examinees versus when impact was not present, and when the studied item was excluded from the anchor test versus when the studied item was included in the anchor test. Statistical power rates were generally too low to merit practical use of these methods in isolation, at least under the conditions of this study.

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Ruixue. "DIFFERENTIAL ITEM FUNCTIONING AMONG ENGLISH LANGUAGE LEARNERS ON A LARGE-SCALE MATHEMATICS ASSESSMENT." UKnowledge, 2019. https://uknowledge.uky.edu/edsc_etds/50.

Full text

Abstract:

English language learner (ELL) is a term to describe students who are still acquiring English proficiency. In recent decades, ELLs are a very rapidly growing student group in United States. In school classrooms, ELLs are learning English and their academic subjects simultaneously. It is challenging for them to hear lectures, read textbooks, and complete tests in English despite of their inadequate English language proficiency (Ilich, 2013). As a result, the increasing number of ELLs in public schools has paralleled the increase in ELLs’ low mathematics performance (NCES, 2016). Due to the popularization of international large-scale assessments in the recent decade, it is necessary to analyze their psychometric properties (e.g., reliability, validity) so that those results can provide with evidence-based implications for policymakers. Educational researchers need to assess the validity for subgroups within each country. The Programme for International Student Assessment (PISA), as one of the influential large-scale assessments, allows researchers to investigate academic achievement and group membership from a variety of different viewpoints. The current study was to understand the nature and potential sources of the gaps in mathematics achievement between ELLs and non-ELLs. The nature of achievement gap was examined using three DIF methodologies including Mantel-Haenszel procedure, Rasch analysis, and Hierarchical Generalized Linear Model (HGLM) at the item level instead of total test level. Amon the three methods, HGLM was utilized to examine the potential sources of DIF. This method can take into account of the nested structure of data where items are nested within students, and students nested within schools. At the student level, sources of DIF were investigated through students’ variations in mathematics self-efficacy, language proficiency, and student socioeconomic status. At the school level, school type and school educational resource were investigated as potential sources of DIF after controlling for the student variables. The U.S. sample from PISA 2012 was used, and 76 dichotomously coded items from PISA 2012 mathematics assessment were included to detect DIF effects. Results revealed that ten common items are identified with DIF effects using MH procedure, Rasch analysis, and HGLM. These ten items are all in favor of non-ELLs.The decreasing number of items showing DIF effects in HGLM after controlling for student-level variables revealed mathematics self-efficacy, language proficiency, and SES are potential sources of DIF between ELLs and non-ELLs. In addition, the number of DIF items continued to decrease after controlling for both student and school-level variables. This finding proved that school type and school educational resources were also potential sources of DIF between ELLs and non-ELLs. Findings from this study can help educational researchers, administrators, and policymakers understand the nature of the gap at item level instead of the total test level so that United States can be competitive in middle school mathematics education. This study can also help guide item writers and test developers in the construction of more linguistically accessible assessments for students who are still learning English. The significance of this study lies in the empirical investigation of the gap between ELLs and non-ELLs in mathematics achievement at an item level and from perspectives of both students and schools.

APA, Harvard, Vancouver, ISO, and other styles

50

Raiford-Ross, Terris. "The Impact of Multidimensionality on the Detection of Differential Bundle Functioning Using SIBTEST." Digital Archive @ GSU, 2008. http://digitalarchive.gsu.edu/eps_diss/14.

Full text

Abstract:

In response to public concern over fairness in testing, conducting a differential item functioning (DIF) analysis is now standard practice for many large-scale testing programs (e.g., Scholastic Aptitude Test, intelligence tests, licensing exams). As highlighted by the Standards for Educational and Psychological Testing manual, the legal and ethical need to avoid bias when measuring examinee abilities is essential to fair testing practices (AERA-APA-NCME, 1999). Likewise, the development of statistical and substantive methods of investigating DIF is crucial to the goal of designing fair and valid educational and psychological tests. Douglas, Roussos and Stout (1996) introduced the concept of item bundle DIF and the implications of differential bundle functioning (DBF) for identifying the underlying causes of DIF. Since then, several studies have demonstrated DIF/DBF analyses within the framework of “unintended” multidimensionality (Oshima & Miller, 1992; Russell, 2005). Russell (2005), in particular, examined the effect of secondary traits on DBF/DTF detection. Like Russell, this study created item bundles by including multidimensional items on a simulated test designed in theory to be unidimensional. Simulating reference group members to have a higher mean ability than the focal group on the nuisance secondary dimension, resulted in DIF for each of the multidimensional items, that when examined together produced differential bundle functioning. The purpose of this Monte Carlo simulation study was to assess the Type I error and power performance of SIBTEST (Simultaneous Item Bias Test; Shealy & Stout, 1993a) for DBF analysis under various conditions with simulated data. The variables of interest included sample size and ratios of reference to focal group sample sizes, correlation between primary and secondary dimensions, magnitude of DIF/DBF, and angular item direction. Results showed SIBTEST to be quite powerful in detecting DBF and controlling Type I error for almost all of the simulated conditions. Specifically, power rates were .80 or above for 84% of all conditions and the average Type I error rate was approximately .05. Furthermore, the combined effect of the studied variables on SIBTEST power and Type I error rates provided much needed information to guide further use of SIBTEST for identifying potential sources of differential item/bundle functioning.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Differential Item Functioning (DIF)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles