Log in

Relevant bibliographies by topics / Differential Item Functioning (DIF) / Journal articles

To see the other types of publications on this topic, follow the link: Differential Item Functioning (DIF).

Journal articles on the topic 'Differential Item Functioning (DIF)'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Differential Item Functioning (DIF).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Prieto-Marañón, Pedro, María Ester Aguerri, María Silvia Galibert, and Horacio Félix Attorresi. "Detection of Differential Item Functioning." Methodology 8, no. 2 (August 1, 2012): 63–70. http://dx.doi.org/10.1027/1614-2241/a000038.

Full text

Abstract:

This study analyzes Differential Item Functioning (DIF) with three combined decision rules and compares the results with the variation of the Mantel-Haenszel procedure (vaMH) proposed by Mazor, Clauser, and Hambleton (1994). One decision rule combines the Mantel-Haenszel procedure (MH) with the Breslow-Day test of trend in odds ratio heterogeneity (BDT), having performed the Bonferroni adjustment, as Randall Penfield proposed. The second uses both MH and BDT without the Bonferroni adjustment. The third combines MH with the Breslow-Day test for homogeneity of the odds ratio without the Bonferroni adjustment. The three decision rules yielded satisfactory results, showed similar power, and none of them detected DIF erroneously. The second rule proved to be the most powerful in the presence of nonuniform DIF. Only in the presence of uniform DIF with the smallest difference of difficulty parameters, was there evidence of vaMH’s superiority.

APA, Harvard, Vancouver, ISO, and other styles

2

Walker, Cindy M., Bo Zhang, Kathleen Banks, and Kevin Cappaert. "Establishing Effect Size Guidelines for Interpreting the Results of Differential Bundle Functioning Analyses Using SIBTEST." Educational and Psychological Measurement 72, no. 3 (October 11, 2011): 415–34. http://dx.doi.org/10.1177/0013164411422250.

Full text

Abstract:

The purpose of this simulation study was to establish general effect size guidelines for interpreting the results of differential bundle functioning (DBF) analyses using simultaneous item bias test (SIBTEST). Three factors were manipulated: number of items in a bundle, test length, and magnitude of uniform differential item functioning (DIF) against the focal group in each item in a bundle. A secondary purpose was to validate the current effect size guidelines for interpreting the results of single-item DIF analyses using SIBTEST. The results of this study clearly demonstrate that ability estimation bias can only be attributed to DIF or DBF when a large number of items in a bundle are functioning differentially against focal examinees in a small way or a small number of items are functioning differentially against focal examinees in a large way. In either of these situations, the presence of DIF or DBF should be a cause for concern because it would lead one to erroneously believe that distinct groups differ in ability when in fact they do not.

APA, Harvard, Vancouver, ISO, and other styles

3

French, Brian F., and Thao T. Vo. "Differential Item Functioning of a Truancy Assessment." Journal of Psychoeducational Assessment 38, no. 5 (July 19, 2019): 642–48. http://dx.doi.org/10.1177/0734282919863215.

Full text

Abstract:

The Washington Assessment of Risk and Needs of Students (WARNS) is a brief self-report measure designed for schools, courts, and youth service providers to identify student behaviors and contexts related to school truancy. Empirical support for WARNS item invariance between ethnic groups is lacking. This study examined differential item functioning (DIF) to ensure that items on the WARNS function similarly across groups, especially for groups where truancy rates are highest. The item response theory graded response model was used to examine DIF between Caucasian, African American, and Latinx students. DIF was identified in six items across WARNS domains. The DIF amount and magnitude likely will not influence decisions based on total scores. Implications for practice and suggestions for an ecological framework to explain the DIF results are discussed.

APA, Harvard, Vancouver, ISO, and other styles

4

Johanson, George A. "Differential Item Functioning in Attitude Assessment." Evaluation Practice 18, no. 2 (June 1997): 127–35. http://dx.doi.org/10.1177/109821409701800204.

Full text

Abstract:

Differential item functioning (DIF) is not often seen in the literature on attitude assessment. A brief discussion of DIF and methods of implementation is followed by an illustrative example from a program evaluation, using an attitude-towards-science scale with 1550 children in grades one through six. An item exhibiting substantial DIF with respect to gender was detected using the Mantel-Haenszel procedure. In a second example, data from workshop evaluations with 1682 adults were recoded to a binary format, and it was found that an item suspected of functioning differentially with respect to age groups was, in fact, not doing so. Implications for evaluation practice are discussed.

APA, Harvard, Vancouver, ISO, and other styles

5

ALWI, IDRUS. "SENSITIVITY OF MANTEL HAENSZEL MODEL AND RASCH MODEL AS VIEWED FROM SAMPLE SIZE." Jurnal Evaluasi Pendidikan 2, no. 1 (May 9, 2017): 18. http://dx.doi.org/10.21009/jep.021.02.

Full text

Abstract:

The aims of this research is to study the sensitivity comparison of Mantel Haenszel and Rasch Model for detection differential item functioning, observed from the sample size. These two differential item functioning (DIF) methods were compared using simulate binary item respon data sets of varying sample size, 200 and 400 examinees were used in the analyses, a detection method of differential item functioning (DIF) based on gender difference. These test conditions were replication 4 times. For both differential item functioning (DIF) detection methods, a test length of 42 items was sufficient for satisfactory differential item functioning (DIF) detection with detection rate increasing as sample size increased. Finding the study revealed that the empirical result show Rasch Model are more sensitive to detection differential item functioning (DIF) than Mantel Haenszel. With reference to findings of this study, it is recomended that the use of Rasch Model in evaluation activities with multiple choice test. For this purpose, it is necessary for every school to have some teachers who are skillfull in analyzing results of test using modern methods (Item Response Theory).

APA, Harvard, Vancouver, ISO, and other styles

6

Wetzel, Eunike, and Benedikt Hell. "Gender-Related Differential Item Functioning in Vocational Interest Measurement." Journal of Individual Differences 34, no. 3 (August 1, 2013): 170–83. http://dx.doi.org/10.1027/1614-0001/a000112.

Full text

Abstract:

Large mean differences are consistently found in the vocational interests of men and women. These differences may be attributable to real differences in the underlying traits. However, they may also depend on the properties of the instrument being used. It is conceivable that, in addition to the intended dimension, items assess a second dimension that differentially influences responses by men and women. This question is addressed in the present study by analyzing a widely used German interest inventory (Allgemeiner Interessen-Struktur-Test, AIST-R) regarding differential item functioning (DIF) using a DIF estimate in the framework of item response theory. Furthermore, the impact of DIF at the scale level is investigated using differential test functioning (DTF) analyses. Several items on the AIST-R’s scales showed significant DIF, especially on the Realistic, Social, and Enterprising scales. Removal of DIF items reduced gender differences on the Realistic scale, though gender differences on the Investigative, Artistic, and Social scales remained practically unchanged. Thus, responses to some AIST-R items appear to be influenced by a secondary dimension apart from the interest domain the items were intended to measure.

APA, Harvard, Vancouver, ISO, and other styles

7

Jin, Kuan-Yu, Hui-Fang Chen, and Wen-Chung Wang. "Using Odds Ratios to Detect Differential Item Functioning." Applied Psychological Measurement 42, no. 8 (March 21, 2018): 613–29. http://dx.doi.org/10.1177/0146621618762738.

Full text

Abstract:

Differential item functioning (DIF) makes test scores incomparable and substantially threatens test validity. Although conventional approaches, such as the logistic regression (LR) and the Mantel–Haenszel (MH) methods, have worked well, they are vulnerable to high percentages of DIF items in a test and missing data. This study developed a simple but effective method to detect DIF using the odds ratio (OR) of two groups’ responses to a studied item. The OR method uses all available information from examinees’ responses, and it can eliminate the potential influence of bias in the total scores. Through a series of simulation studies in which the DIF pattern, impact, sample size (equal/unequal), purification procedure (with/without), percentages of DIF items, and proportions of missing data were manipulated, the performance of the OR method was evaluated and compared with the LR and MH methods. The results showed that the OR method without a purification procedure outperformed the LR and MH methods in controlling false positive rates and yielding high true positive rates when tests had a high percentage of DIF items favoring the same group. In addition, only the OR method was feasible when tests adopted the item matrix sampling design. The effectiveness of the OR method with an empirical example was illustrated.

APA, Harvard, Vancouver, ISO, and other styles

8

Rome, Logan, and Bo Zhang. "Investigating the Effects of Differential Item Functioning on Proficiency Classification." Applied Psychological Measurement 42, no. 4 (August 29, 2017): 259–74. http://dx.doi.org/10.1177/0146621617726789.

Full text

Abstract:

This study provides a comprehensive evaluation of the effects of differential item functioning (DIF) on proficiency classification. Using Monte Carlo simulation, item- and test-level DIF magnitudes were varied systematically to investigate their impact on proficiency classification at multiple decision points. Findings from this study clearly show that the presence of DIF affects proficiency classification not by lowering the overall correct classification rates but by affecting classification error rates differently for reference and focal group members. The study also reveals that multiple items with low levels of DIF can be particularly problematic. They can do similar damage to proficiency classification as high-level DIF items with the same cumulative magnitudes but are much harder to detect with current DIF and differential bundle functioning (DBF) techniques. Finally, how DIF affects proficiency classification errors at multiple cut scores is fully described and discussed.

APA, Harvard, Vancouver, ISO, and other styles

9

Rudas, Tamás, and Rebecca Zwick. "Estimating the Importance of Differential Item Functioning." Journal of Educational and Behavioral Statistics 22, no. 1 (March 1997): 31–45. http://dx.doi.org/10.3102/10769986022001031.

Full text

Abstract:

Several methods have been proposed to detect differential item functioning (DIF), an item response pattern in which members of different demographic groups have different conditional probabilities of answering a test item correctly, given the same level of ability. In this article, the mixture index of fit, proposed by Rudas, Clogg, and Lindsay (1994) , is used to estimate the fraction of the population for which DIF occurs, and this approach is compared to the Mantel-Haenszel ( Mantel & Haenszel, 1959 ) test of DIF developed by Holland (1985 ; see Holland & Thayer, 1988) . The proposed estimation procedure, which is noniterative, can provide information about which portions of the item response data appear to be contributing to DIF.

APA, Harvard, Vancouver, ISO, and other styles

10

SUDARYONO, SUDARYONO. "SENSITIVITAS METODE PENDETEKSIAN DIFFERENTIAL ITEM FUNCTIONING (DIF)." Jurnal Evaluasi Pendidikan 3, no. 1 (May 9, 2017): 82. http://dx.doi.org/10.21009/jep.031.07.

Full text

Abstract:

Tujuan utama dari penelitian ini adalah untuk mengetahui data empiris tentang sensitivitas perbandingan antara metode Chi-Square Scheuneman, metode Mantel-Haenszel metode teori responsi butir model Rasch dalam mendeteksi keberadaan Differential Item Functioning (DIF). Penelitian ini menggunakan metode eksperimental dengan desain 1 x 3. Variabel independen dalam penelitian ini adalah, Chi-square Scheuneman, metode Mantel-Haenszel, dan teori responsi butir model Rasch. Secara spesifik tujuan penelitian ini adalah untuk mengungkapkan: (1) karakteristik butir tes berdasarkan teori tes klasik, (2) kesalahan pengukuran standar berdasarkan teori tes klasik, dan (3) deteksi butir soal yang terindikasi mengandung Differential Item Functioning (DIF) berdasarkan perbedaan gender. Analisis data penelitian ini didasarkan pada responsi peserta tes Ujian Nasional Matematika pada Sekolah Menengah Atas di Tangerang tahun akademik 2008/2009. Sumber data penelitian ini adalah lembar jawaban komputer dari 5000 siswa yang terdiri dari 2500 siswa pria dan 2500 siswa perempuan yang diambil secara acak dengan menggunakan teknik simple random sampling. Hasil analisis deskriptif dengan menggunakan teori tes klasik menunjukkan bahwa ada 28 butir tes dari 40 butir tes matematika, dengan indeks keandalan reliabilitas sebesar 0,827. Hasil penelitian menunjukkan bahwa semua metode yang digunakan untuk mendeteksi DIF cukup baik, namun metode teori responsi butir model Rasch adalah model yang paling sensitif dibandingkan dengan metode Mantel-Haenszel, dan metode Chi-square Scheuneman.

APA, Harvard, Vancouver, ISO, and other styles

11

Fukuhara, Hirotaka, and Akihito Kamata. "A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items." Applied Psychological Measurement 35, no. 8 (November 2011): 604–22. http://dx.doi.org/10.1177/0146621611428447.

Full text

Abstract:

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.

APA, Harvard, Vancouver, ISO, and other styles

12

Zwick, Rebecca, and Dorothy T. Thayer. "Evaluating the Magnitude of Differential Item Functioning in Polytomous Items." Journal of Educational and Behavioral Statistics 21, no. 3 (September 1996): 187–201. http://dx.doi.org/10.3102/10769986021003187.

Full text

Abstract:

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in polytomous test items that are scored on an ordinal scale. Mantel’s extension of the Mantel-Haenszel test is one of several hypothesis-testing methods for this purpose. The development of descriptive statistics for characterizing DIF in polytomous test items has received less attention. As a step in this direction, two possible standard error formulas for the polytomous DIF index proposed by Dorans and Schmitt were derived. These standard errors, as well as associated hypothesis-testing procedures, were evaluated though application to simulated data. The standard error that performed better is based on Mantel’s hypergeometric model. The alternative standard error, based on a multinomial model, tended to yield values that were too small.

APA, Harvard, Vancouver, ISO, and other styles

13

Huang, Jinyan, and Turgay Han. "Revisiting Differential Item Functioning: Implications for Fairness Investigation." International Journal of Education 4, no. 2 (April 17, 2012): 74. http://dx.doi.org/10.5296/ije.v4i2.1654.

Full text

Abstract:

Fairness has been the priority in educational assessments during the past few decades. Differential item functioning (DIF) becomes an important statistical procedure in the investigation of assessment fairness. For any given large-scale assessment, DIF evaluation is suggested as a standard procedure by American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. This procedure often affords opportunities to check for group differences in test performance and investigate whether or not these differences indicate bias. However, current DIF research has received several criticisms. Revisiting DIF, this paper critically reviews current DIF research and proposes new directions for DIF research in the investigation of assessment fairness.

APA, Harvard, Vancouver, ISO, and other styles

14

Van den Noortgate, Wim, and Paul De Boeck. "Assessing and Explaining Differential Item Functioning Using Logistic Mixed Models." Journal of Educational and Behavioral Statistics 30, no. 4 (December 2005): 443–64. http://dx.doi.org/10.3102/10769986030004443.

Full text

Abstract:

Although differential item functioning (DIF) theory traditionally focuses on the behavior of individual items in two (or a few) specific groups, in educational measurement contexts, it is often plausible to regard the set of items as a random sample from a broader category. This article presents logistic mixed models that can be used to model uniform DIF, treating the item effects and their interaction with groups (DIF) as random. In a similar way, the group effects can be modeled as random instead of fixed, if the groups can be considered a random sample from a population of groups. The models can, furthermore, be adapted easily for modeling DIF over individual persons rather than over groups, or for modeling the differential functioning of groups of items instead of individual items. It is shown that the logistic mixed model approach is not only a comprehensive and economical way to detect these different kinds of DIF, it also encourages us to explore possible explanations of DIF by including group or item covariates in the model.

APA, Harvard, Vancouver, ISO, and other styles

15

Swanson, David B., Brian E. Clauser, Susan M. Case, Ronald J. Nungester, and Carol Featherman. "Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models." Journal of Educational and Behavioral Statistics 27, no. 1 (March 2002): 53–75. http://dx.doi.org/10.3102/10769986027001053.

Full text

Abstract:

Over the past 25 years a range of parametric and nonparametric methods have been developed for analyzing Differential Item Functioning (DIF). These procedures are typically performed for each item individually or for small numbers of related items. Because the analytic procedures focus on individual items, it has been difficult to pool information across items to identify potential sources of DIF analytically. In this article, we outline an approach to DIF analysis using hierarchical logistic regression that makes it possible to combine results of logistic regression analyses across items to identify consistent sources of DIF, to quantify the proportion of explained variation in DIF coefficients, and to compare the predictive accuracy of alternate explanations for DIF. The approach can also be used to improve the accuracy of DIF estimates for individual items by applying empirical Bayes techniques, with DIF-related item characteristics serving as collateral information. To illustrate the hierarchical logistic regression procedure, we use a large data set derived from recent computer-based administrations of Step 2, the clinical science component of the United States Medical Licensing Examination (USMLE®). Results of a small Monte Carlo study of the accuracy of the DIF estimates are also reported.

APA, Harvard, Vancouver, ISO, and other styles

16

Roussos, Louis A., Deborah L. Schnipke, and Peter J. Pashley. "A Generalized Formula for the Mantel-Haenszel Differential Item Functioning Parameter." Journal of Educational and Behavioral Statistics 24, no. 3 (September 1999): 293–322. http://dx.doi.org/10.3102/10769986024003293.

Full text

Abstract:

The present study derives a general formula for the population parameter being estimated by the Mantel-Haenszel (MH) differential item functioning (DIF) statistic. Because the formula is general, it is appropriate for either uniform DIF (defined as a difference in item response theory item difficulty values) or nonuniform DIF; and it can be used regardless of the form of the item response function. In the case of uniform DIF modeled with two-parameter-logistic response functions, the parameter is well known to be linearly related to the difference in item difficulty between the focal and reference groups. Even though this relationship is known to not strictly hold true in the case of three-parameter-logistic (3PL) uniform DIE the degree of the departure from this relationship has not been known and has been generally believed to be small By evaluating the MH DIF parameter, we show that for items of medium or high difficulty, the parameter is much smaller in absolute value than expected based on the difference in item difficulty between the two groups. These results shed new light on results from previous simulation studies that showed the MH DIF statistic has a tendency to shrink toward zero with increasing difficulty level when used with 3PL data.

APA, Harvard, Vancouver, ISO, and other styles

17

Aksu Dunya, Beyza, Clark McKown, and Everett Smith. "Psychometric Properties and Differential Item Functioning of a Web-Based Assessment of Children’s Emotion Recognition Skill." Journal of Psychoeducational Assessment 38, no. 5 (November 6, 2019): 627–41. http://dx.doi.org/10.1177/0734282919881919.

Full text

Abstract:

Emotion recognition (ER) involves understanding what others are feeling by interpreting nonverbal behavior, including facial expressions. The purpose of this study is to evaluate the psychometric properties of a web-based social ER assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3,224 children and the second sample included 4,419 children. Data were calibrated using Rasch dichotomous model. Differential item and test functioning were also evaluated across gender and ethnicity. Across both samples, we found consistent item fit, unidimensional item structure, and adequate item targeting. Analyses of differential item functioning (DIF) found six out of 111 items displaying DIF across gender and no items demonstrating DIF across ethnicity. The analyses of person measure calibrations with and without DIF items yielded no evidence of differential test functioning (DTF) across gender and ethnicity groups in both samples.

APA, Harvard, Vancouver, ISO, and other styles

18

De Boeck, Paul, Sun-Joo Cho, and Mark Wilson. "Explanatory Secondary Dimension Modeling of Latent Differential Item Functioning." Applied Psychological Measurement 35, no. 8 (November 2011): 583–603. http://dx.doi.org/10.1177/0146621611428446.

Full text

Abstract:

The models used in this article are secondary dimension mixture models with the potential to explain differential item functioning (DIF) between latent classes, called latent DIF. The focus is on models with a secondary dimension that is at the same time specific to the DIF latent class and linked to an item property. A description of the models is provided along with a means of estimating model parameters using easily available software and a description of how the models behave in two applications. One application concerns a test that is sensitive to speededness and the other is based on an arithmetic operations test where the division items show latent DIF.

APA, Harvard, Vancouver, ISO, and other styles

19

Wetzel, Eunike, Jan R. Böhnke, Claus H. Carstensen, Matthias Ziegler, and Fritz Ostendorf. "Do Individual Response Styles Matter?" Journal of Individual Differences 34, no. 2 (May 1, 2013): 69–81. http://dx.doi.org/10.1027/1614-0001/a000102.

Full text

Abstract:

The occurrence of differential item functioning (DIF) for gender indicates that an instrument may not be functioning equivalently for men and women. Aside from DIF effects, item responses in personality questionnaires can also be influenced by response styles. This study analyzes the German NEO-PI-R regarding its differential item functioning for men and women while taking response styles into account. To this end, mixed Rasch models were estimated first to identify latent classes that differed in their response style. These latent classes were identified as extreme response style (ERS) and nonextreme response style (NERS). Then, DIF analyses were conducted separately for the different response styles and compared with DIF results for the complete sample. Several items especially on Neuroticism, Agreeableness, and Conscientiousness facets showed gender-DIF and thus function differentially between men and women. DIF results differed mainly in size between the complete sample and the response style subsamples, though DIF classification was overall consistent between ERS, NERS, and the complete sample.

APA, Harvard, Vancouver, ISO, and other styles

20

Berger, Moritz, and Gerhard Tutz. "Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees." Journal of Educational and Behavioral Statistics 41, no. 6 (July 28, 2016): 559–92. http://dx.doi.org/10.3102/1076998616659371.

Full text

Abstract:

Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an alternative method that is a combination of recursive partitioning methods (or trees) and logistic regression methodology to detect UDIF and NUDIF in a nonparametric way. The output of the method are trees that visualize in a simple way the structure of DIF in an item showing which variables are interacting in which way when generating DIF. In addition, we consider a logistic regression method, in which DIF can be induced by a vector of covariates, which may include categorical but also continuous covariates. The methods are investigated in simulation studies and illustrated by two applications.

APA, Harvard, Vancouver, ISO, and other styles

21

Shanmugam, S. Kanageswari Suppiah. "Determining Gender Differential Item Functioning for Mathematics in Coeducational School Culture." Malaysian Journal of Learning and Instruction 15, Number 2 (December 31, 2018): 83–109. http://dx.doi.org/10.32890/mjli2018.15.2.4.

Full text

Abstract:

Purpose - In an attempt to explore item characteristics that behave differently between boys and girls, this comparative study examines gender Differential Item Functioning in a school culture that is noted to be ‘thriving’ mathematically. Methodology - Some 24 grade eight mathematics items from TIMSS 2003 and TIMSS 2007 released items, with equal number of computation and word problem items were administered on 460 boys and 445 girls studying in Grade Eight from three secondary Chinese-medium coeducational schools. Word problem items were defined as items set in a real-world context. Content validity was established by constructing a table of specifications. By employing the software WINSTEPS version 3.67.0 that is based on the Rasch Model for dichotomous responses, Differential Item Functioning analysis was conducted by using Mantel-Haenszel Chi-square method. DIF items were flagged when the Mantel-Haenszel probability value was less than 0.05 and classified as negligible, moderate or large DIF based on the DIF size suggested by Educational Testing Service DIF category. The focal and reference groups were girls and boys respectively. The main delimitation was substantive analysis by using expert judgment was not conducted to identify biased items. Findings - Using Mantel-Haenszel chi-square, two moderate DIF items that assess subtraction favoured girls. They assessed Knowing from topics Whole Number and Fraction. Sources of DIF are linguistics density and item presentation style. The findings suggest that with only two moderate DIF mathematics items, there is insufficient evidence to suggest that the mathematics items functioned differently between boys and girls in school culture noted for successful mathematics learning, even though linguistics complexities of the test language cannot be ignored. Significance - While constructing Mathematics multiple choice items, careful considerations need to be given in selecting suitable numbers in composing the content of the item so that, only the correct algorithm would produce the correct answer ‘if and only if’ those numbers are used. The modest results of detecting two moderate DIF items nevertheless inform national testing agencies and teacher educators on the principles of building fair items as a part of their test improvement practice in the 21st Century. The novelty of this study is that gender Differential Item Functioning was studied in the context of school culture, which is notable for successful mathematics learning.

APA, Harvard, Vancouver, ISO, and other styles

22

Thielemann, Desiree, Felicitas Richter, Bernd Strauss, Elmar Braehler, Uwe Altmann, and Uwe Berger. "Differential Item Functioning in Brief Instruments of Disordered Eating." European Journal of Psychological Assessment 35, no. 6 (November 2019): 823–33. http://dx.doi.org/10.1027/1015-5759/a000472.

Full text

Abstract:

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.

APA, Harvard, Vancouver, ISO, and other styles

23

Zwick, Rebecca. "When Do Item Response Function and Mantel-Haenszel Definitions of Differential Item Functioning Coincide?" Journal of Educational Statistics 15, no. 3 (September 1990): 185–97. http://dx.doi.org/10.3102/10769986015003185.

Full text

Abstract:

A test item is typically considered free of differential item functioning (DIF) if its item response function is the same across demographic groups. A popular means of testing for DIF is the Mantel-Haenszel (MH) approach. Holland and Thayer (1988) showed that, under the Rasch model, identity of item response functions across demographic groups implies that the MH null hypothesis will be satisfied when the MH matching variable is test score, including the studied item. This result, however, cannot be generalized to the class of items for which item response functions are monotonic and local independence holds. Suppose that all item response functions are identical across groups, but the ability distributions for the two groups are stochastically ordered. In general, the population MH result will show DIF favoring the higher group on some items and the lower group on others. If the studied item is excluded from the matching criterion under these conditions, the population MH result will always show DIF favoring the higher group.

APA, Harvard, Vancouver, ISO, and other styles

24

van Schilt-Mol, Tamara, Ton Vallen, and Henny Uiterwijk. "Onderzoek Naar Oorzaken van 'Differential Item Functioning' in de Eindtoets Basisonderwijs." Toegepaste Taalwetenschap in Artikelen 74 (January 1, 2005): 135–45. http://dx.doi.org/10.1075/ttwia.74.13sch.

Full text

Abstract:

Previous research has shown that the Dutch 'Final Test of Primary Education' contains a number of unintentionally and therefore unwanted, difficult test items, leading to Differential Item Functioning (DIF) for immigrant minority students whose parents' dominant language is Turkish or Arab/Berber. Two statistical procedures were used to identify DIF-items in the Final Test of 1997. Subsequently, five experiments were conducted to detect causes of DIF, revealing a number of hypotheses concerning possible linguistic, cultural, and textual sources. These hypotheses were used to manipulate original DIF-items into intentionally DIF-free items. The article discusses three possible sources of DIF: (1) the use of fixed (misleading) answer-options and (2) of misleading illustrations (both in the disadvantage of the minority students), and (3) the fact that questions concerning past tense often lead to DIF (in their advantage).

APA, Harvard, Vancouver, ISO, and other styles

25

Cheng, Chung-Ping, Chi-Chen Chen, and Ching-Lin Shih. "An Exploratory Strategy to Identify and Define Sources of Differential Item Functioning." Applied Psychological Measurement 44, no. 7-8 (June 24, 2020): 548–60. http://dx.doi.org/10.1177/0146621620931190.

Full text

Abstract:

The sources of differential item functioning (DIF) items are usually identified through a qualitative content review by a panel of experts. However, the differential functioning for some DIF items might have been caused by reasons outside of the experts’ experiences, leading to the sources for these DIF items possibly being misidentified. Quantitative methods can help to provide useful information, such as the DIF status and the number of sources of the DIF, which in turn help the item review and revision process to be more efficient and precise. However, the current quantitative methods assume all possible sources should be known in advance and collected to accompany the item response data, which is not always the case in reality. To this end, an exploratory strategy, combined with the MIMIC (multiple-indicator multiple-cause) method, that can be used to identify and name new sources of DIF is proposed in this study. The performance of this strategy was investigated through simulation. The results showed that when a set of DIF-free items can be correctly identified to define the main dimension, the proposed exploratory MIMIC method can accurately recover a number of possible sources of DIF and the items that belong to each. A real data analysis was also implemented to demonstrate how this strategy can be used in reality. The results and findings of this study are further discussed.

APA, Harvard, Vancouver, ISO, and other styles

26

Bollmann, Stella, Moritz Berger, and Gerhard Tutz. "Item-Focused Trees for the Detection of Differential Item Functioning in Partial Credit Models." Educational and Psychological Measurement 78, no. 5 (September 25, 2017): 781–804. http://dx.doi.org/10.1177/0013164417722179.

Full text

Abstract:

Various methods to detect differential item functioning (DIF) in item response models are available. However, most of these methods assume that the responses are binary, and so for ordered response categories available methods are scarce. In the present article, DIF in the widely used partial credit model is investigated. An item-focused tree is proposed that allows the detection of DIF items, which might affect the performance of the partial credit model. The method uses tree methodology, yielding a tree for each item that is detected as DIF item. The visualization as trees makes the results easily accessible, as the obtained trees show which variables induce DIF and in which way. In the present paper, the new method is compared with alternative approaches and simulations demonstrate the performance of the method.

APA, Harvard, Vancouver, ISO, and other styles

27

Doğan, Nuri, Ronald K. Hambleton, Meltem Yurtcu, and Sinan Yavuz. "The comparison of differential item functioning predicted through experts and statistical techniques." Cypriot Journal of Educational Sciences 13, no. 2 (June 26, 2018): 137–48. http://dx.doi.org/10.18844/cjes.v13i2.2427.

Full text

Abstract:

Validity is one of the psychometric properties of the achievement tests. To determine the validity, one of the examinations is item bias studies, which are based on Differential Item Functioning (DIF) analyses and field experts’ opinion. In this study, field experts were asked to estimate the DIF levels of the items to compare the estimations obtained from different statistical techniques. Firstly, the experts were asked to examine the questions and make the DIF level estimations according to the gender variable for the DIF estimation, and the agreement of the experts was examined. Secondly, DIF levels were calculated by using logistic regression and the Mantel-Haenszel (MH) statistical method. Thirdly, the experts’ estimations and the statistical analysis results were compared. As a conclusion, it was observed that the experts and the statistical techniques were in agreement among themselves, and they were partially different from each other for the Sciences test and equal for the Social Sciences test. Keywords: Item bias, differential item functioning (DIF), expert estimation.

APA, Harvard, Vancouver, ISO, and other styles

28

Ma, Wenchao, Ragip Terzi, and Jimmy de la Torre. "Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models." Applied Psychological Measurement 45, no. 1 (October 21, 2020): 37–53. http://dx.doi.org/10.1177/0146621620965745.

Full text

Abstract:

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

APA, Harvard, Vancouver, ISO, and other styles

29

Walker, Cindy M., and Sakine Göçer Şahin. "Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items." Educational and Psychological Measurement 80, no. 4 (January 20, 2020): 808–20. http://dx.doi.org/10.1177/0013164419899731.

Full text

Abstract:

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.

APA, Harvard, Vancouver, ISO, and other styles

30

Ibrahim, Abdul Wahab. "The Applicability of Item Response Theory Based Statistics to Detect Differential Item Functioning in Polytomous Tests." Randwick International of Education and Linguistics Science Journal 1, no. 1 (June 23, 2020): 1–13. http://dx.doi.org/10.47175/rielsj.v1i1.23.

Full text

Abstract:

The study used statistical procedures based on Item Response Theory to detect Differential Item Functioning (DIF) in polytomous tests. These were with a view to improving the quality of test items construction. The sample consisted of an intact class of 513 Part 3 undergraduate students who registered for the course EDU 304: Tests and Measurement at Sule Lamido University during 2017/2018 Second Semester. A self-developed polytomous research instrument was used to collect data. Data collected were analysed using Generalized Mantel Haenszel, Simultaneous Item Bias Test, and Logistic Discriminant Function Analysis. The results showed that there was no significant relationship between the proportions of test items that function differentially in the polytomous test when the different statistical methods are used. Further, the three parametric and non-parametric methods complement each other in their ability to detect DIF in the polytomous test format as all of them have capacity to detect DIF but perform differently. The study concluded that there was a high degree of correspondence between the three procedures in their ability to detect DIF in polytomous tests. It was recommended that test experts and developers should consider using procedure based on Item Response Theory in DIF detection.

APA, Harvard, Vancouver, ISO, and other styles

31

Yesiltas, Gonca, and Insu Paek. "A Log-Linear Modeling Approach for Differential Item Functioning Detection in Polytomously Scored Items." Educational and Psychological Measurement 80, no. 1 (June 10, 2019): 145–62. http://dx.doi.org/10.1177/0013164419853000.

Full text

Abstract:

A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were manipulated. Also, the performance of LLM was compared with that of other observed score–based DIF methods, namely ordinal logistic regression, logistic discriminant function analysis, Mantel, and generalized Mantel-Haenszel, regarding their Type I error (rejection rates) and power (DIF detection rates). For the observed score matching stratification in LLM, 5 and 10 strata were used. Overall, generalized Mantel-Haenszel and LLM with 10 strata showed better performance than other methods, whereas ordinal logistic regression and Mantel showed poor performance in detecting balanced DIF where the DIF direction is opposite in the two pairs of categories and partial DIF where DIF exists only in some of the categories.

APA, Harvard, Vancouver, ISO, and other styles

32

Pires, Jeferson Gervasio, and Carlos Henrique Sancineto da Silva Nunes. "Differential item functioning analysis on a measure of mindfulness (MAP)." Psico 49, no. 2 (August 24, 2018): 101. http://dx.doi.org/10.15448/1980-8623.2018.2.27229.

Full text

Abstract:

Analysis of the differential item functioning (DIF) is of great importance when developing or validating psychological instruments, since it enables to identify whether there are bias on a given instrument concerning sample characteristics. Considering this importance, in this study, we verified the presence of DIF within items of a new instrument for the measurement of mindfulness (MAP), regarding the sex, age, practice with meditation, and use of alternative medicine of the sample. For this, 788 Brazilian adults, mean age of 26 years (SD=9.59), most women (79%) and single, responded the MAP. Overall, no DIF was identified with the positively worded items, indicating that the analyzed items do not favor, specifically, any of the groups tested in the present study.***Análise do funcionamento diferencial de uma medida de atenção plena (MAP)***Análise do funcionamento diferencial do item (DIF) é de extrema importância quando se está desenvolvendo ou adaptando instrumentos psicológicos, já que possibilita verificar diferenças nos escores por conta de variáveis diversas àquela mensurada pelo instrumento. Neste estudo, verificamos a presença de DIF nos itens de uma nova medida de mindfulness (MAP), considerando o sexo, idade, experiência com meditação, e o uso de medicina alternativa da amostra. Participaram 788 adultos brasileiros, com média de idade de 26 anos (DP=9,59), sendo a maioria mulheres (79%) e solteiros (79%), os quais responderam a MAP. De forma geral, não foram encontrados DIF considerados moderados ou altos na subscala mindfulness. Os resultados indicam que os itens da MAP não favorecem, exclusivamente, nenhum dos grupos testados.

APA, Harvard, Vancouver, ISO, and other styles

33

Yao, Don. "Gender-related Differential Item Functioning Analysis on an ESL Test." Journal of Language Testing & Assessment 3, no. 1 (2020): 5–19. http://dx.doi.org/10.23977/langta.2020.030102.

Full text

Abstract:

Differential item functioning (DIF) is a technique used to examine whether items function differently across different groups. The DIF analysis helps detect bias in an assessment to ensure the fairness of the assessment. However, most of the previous research has focused on high-stakes assessments. There is a dearth in research that laying emphasis on low-stakes assessments, which is also significant for the test development and validation process. Additionally, gender difference in test performance is always a particular concern for researchers to evaluate whether a test is fair or not. This present study investigated whether test items of the General English Proficiency Test for Kids (GEPT-Kids) are free of bias in terms of gender differences. A mixed-method sequential explanatory research design was adopted with two phases. In phase I, test performance data of 492 participants from five Chinese speaking cities were analyzed by the Mantel-Haenszel (MH) method to detect gender DIF. In phase II, items that manifested DIF were subject to content analysis through three experienced reviewers to identify possible sources of DIF. The results showed that three items were detected with moderate gender DIF through statistical methods and three items were identified as possible biased items by expert judgment. The results provide preliminary contributions to DIF analysis for low-stakes assessment in the field of language assessment. Besides, young language learners, especially in the Chinese context, have been drawn renewed attention. Thus, the results may also add to the body of literature that can shed some light on the test development for young language learners.

APA, Harvard, Vancouver, ISO, and other styles

34

Yao, Don. "Gender-related Differential Item Functioning Analysis on an ESL Test." Journal of Language Testing & Assessment 3, no. 1 (2020): 5–19. http://dx.doi.org/10.23977/langta.2020.030102.

Full text

Abstract:

Differential item functioning (DIF) is a technique used to examine whether items function differently across different groups. The DIF analysis helps detect bias in an assessment to ensure the fairness of the assessment. However, most of the previous research has focused on high-stakes assessments. There is a dearth in research that laying emphasis on low-stakes assessments, which is also significant for the test development and validation process. Additionally, gender difference in test performance is always a particular concern for researchers to evaluate whether a test is fair or not. This present study investigated whether test items of the General English Proficiency Test for Kids (GEPT-Kids) are free of bias in terms of gender differences. A mixed-method sequential explanatory research design was adopted with two phases. In phase I, test performance data of 492 participants from five Chinese speaking cities were analyzed by the Mantel-Haenszel (MH) method to detect gender DIF. In phase II, items that manifested DIF were subject to content analysis through three experienced reviewers to identify possible sources of DIF. The results showed that three items were detected with moderate gender DIF through statistical methods and three items were identified as possible biased items by expert judgment. The results provide preliminary contributions to DIF analysis for low-stakes assessment in the field of language assessment. Besides, young language learners, especially in the Chinese context, have been drawn renewed attention. Thus, the results may also add to the body of literature that can shed some light on the test development for young language learners.

APA, Harvard, Vancouver, ISO, and other styles

35

Hidalgo-Montesinos, M. Dolores, and Juana Gómez-Benito. "Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression." European Journal of Psychological Assessment 19, no. 1 (March 2003): 1–11. http://dx.doi.org/10.1027//1015-5759.19.1.1.

Full text

Abstract:

Summary We conducted a computer simulation study to determine the effect of using an iterative or noniterative multinomial logistic regression analysis (MLR) to detect differential item functioning (DIF) in polytomous items. A simple iteration in which ability is defined as total observed score in the test is compared with a two-step MLR in which the ability was purified by eliminating the DIF items. Data were generated to simulate several biased tests. The factors manipulated were: DIF effect size (0.5, 1.0, and 1.5), percentage of DIF items in the test (0%, 10%, 20% and 30%), DIF type (uniform and nonuniform) and sample size (500, 1000 and 2000). Item scores were generated using the graded response model. The MLR procedures were consistently able to detect both uniform and nonuniform DIF. When the two-step MLR procedure was used, the false-positive rate (the proportion of non-DIF items that were detected as DIF) decreased and the correct identification rate increased slightly. The purification process results in an improvement in the correct detection rate only in uniform DIF, large sample size, and large amount of DIF conditions. For nonuniform DIF there is no difference between the MLR-WP and MLR-TP procedures.

APA, Harvard, Vancouver, ISO, and other styles

36

Huang, Hung-Yu. "Effects of the Common Scale Setting in the Assessment of Differential Item Functioning." Psychological Reports 114, no. 1 (February 2014): 104–25. http://dx.doi.org/10.2466/03.pr0.114k11w0.

Full text

Abstract:

This study compares three methods of detecting differential item functioning (DIF), the equal mean difficulty (EMD), all-other-item (AOI), and constant item (CI) methods, in terms of estimation bias and rank order change of ability estimates using a series of simulations and two empirical examples. The CI method generated accurate DIF parameter estimates, whereas the EMD and AOI methods produced biased estimates. Moreover, as the percentage of DIF items in a test increased, the superiority of the CI method over the EMD and AOI methods became more apparent. The superiority of the CI method is independent of the sample size, test length, and item type (dichotomous or polytomous). Two empirical examples, a mathematics test and a hostility questionnaire, demonstrated that these three methods yielded inconsistent DIF detections and produced different ability estimate rankings.

APA, Harvard, Vancouver, ISO, and other styles

37

Gadermann, Anne M., Michelle Y. Chen, Scott D. Emerson, and Bruno D. Zumbo. "Examining Validity Evidence of Self-Report Measures Using Differential Item Functioning." Methodology 14, no. 4 (October 1, 2018): 165–76. http://dx.doi.org/10.1027/1614-2241/a000156.

Full text

Abstract:

Abstract. The investigation of differential item functioning (DIF) is important for any group comparison because the validity of the inferences made from scale scores could be compromised if DIF is present. DIF occurs when individuals from different groups show different probabilities of selecting a response option to an item after being matched on the underlying latent variable that the item is supposed to measure. The aim of this paper is to inform the practice of DIF analyses in survey research. We focus on three quantitative methods to detect DIF, namely nonparametric item response theory (NIRT), ordinal logistic regression (OLR), and mixed-effects or multilevel models. Using these methods, we demonstrate how to examine DIF at the item and scale levels, as well as in multilevel settings. We discuss when these techniques are appropriate to use, what data assumptions they have, and their advantages and disadvantages in the analysis of survey data.

APA, Harvard, Vancouver, ISO, and other styles

38

Acar, Tülin. "Determination of a Differential Item Functioning Procedure Using the Hierarchical Generalized Linear Model." SAGE Open 2, no. 1 (January 1, 2012): 215824401243676. http://dx.doi.org/10.1177/2158244012436760.

Full text

Abstract:

The aim of this research is to compare the result of the differential item functioning (DIF) determining with hierarchical generalized linear model (HGLM) technique and the results of the DIF determining with logistic regression (LR) and item response theory–likelihood ratio (IRT-LR) techniques on the test items. For this reason, first in this research, it is determined whether the students encounter DIF with HGLM, LR, and IRT-LR techniques according to socioeconomic status (SES), in the Turkish, Social Sciences, and Science subtest items of the Secondary School Institutions Examination. When inspecting the correlations among the techniques in terms of determining the items having DIF, it was discovered that there was significant correlation between the results of IRT-LR and LR techniques in all subtests; merely in Science subtest, the results of the correlation between HGLM and IRT-LR techniques were found significant. DIF applications can be made on test items with other DIF analysis techniques that were not taken to the scope of this research. The analysis results, which were determined by using the DIF techniques in different sample sizes, can be compared.

APA, Harvard, Vancouver, ISO, and other styles

39

Soares, Tufi M., Flávio B. Gonçalves, and Dani Gamerman. "An Integrated Bayesian Model for DIF Analysis." Journal of Educational and Behavioral Statistics 34, no. 3 (September 2009): 348–77. http://dx.doi.org/10.3102/1076998609332752.

Full text

Abstract:

In this article, an integrated bayesian model for differential item functioning (DIF) analysis is proposed. The model is integrated in the sense of modeling the responses along with the DIF analysis. This approach allows DIF detection and explanation in a simultaneous setup. Previous empirical studies and/or subjective beliefs about the item parameters, including differential functioning behavior, may be conveniently expressed in terms of prior distributions. Values of indicator variables are estimated in the model, indicating which items have DIF and which do not; as a result, the data analyst may not be required to specify an “anchor set” of items that do not exhibit DIF a priori to identify the model. It reduces the iterative procedures that are commonly used for proficiency purification and DIF detection and explanation. Examples demonstrate the efficiency of this method in simulated and real situations.

APA, Harvard, Vancouver, ISO, and other styles

40

González-Betanzos, Fabiola, and Francisco J. Abad. "The Effects of Purification and the Evaluation of Differential Item Functioning With the Likelihood Ratio Test." Methodology 8, no. 4 (January 1, 2012): 134–45. http://dx.doi.org/10.1027/1614-2241/a000046.

Full text

Abstract:

The current research compares the effects of several strategies to establish the anchor subtest when detecting for differential item functioning (DIF) using the IRT likelihood ratio test in one- and two-stage procedures. Two one-stage strategies were examined: (1) “One item” and (2) “All other items” used as anchor. Additionally, two two-stage strategies were tested: (3) “One anchor item with posterior anchor test augmentation” and (4) “All other items with purification.” The strategies were compared in a simulation study, where sample sizes, DIF size, type of DIF, and software implementation (MULTILOG vs. IRTLRDIF) were manipulated. Results indicated that Procedure (1) was more efficient than (2). Purification was found to improve Type I error rates substantially with the “all other items” strategy, while “posterior anchor test augmentation” did not yield a significant improvement. In relation to the effect of the software used, we found that MULTILOG generally offers better results than IRTLRDIF.

APA, Harvard, Vancouver, ISO, and other styles

41

Rijkeboer, Marleen M., Huub van den Bergh, and Jan van den Bout. "Item Bias Analysis of the Young Schema-Questionnaire for Psychopathology, Gender, and Educational Level." European Journal of Psychological Assessment 27, no. 1 (January 2011): 65–70. http://dx.doi.org/10.1027/1015-5759/a000044.

Full text

Abstract:

This study examines the construct validity of the Young Schema-Questionnaire at the item level in a Dutch population. Possible bias of items in relation to the presence or absence of psychopathology, gender, and educational level was analyzed, using a cross-validation design. None of the items of the YSQ exhibited differential item functioning (DIF) for gender, and only one item showed DIF for educational level. Furthermore, item bias analysis did not identify DIF for the presence or absence of psychopathology in as much as 195 of the 205 items comprising the YSQ. Ten items, however, spread over the questionnaire, were found to yield relatively inconsistent response patterns for patients and nonclinical participants.

APA, Harvard, Vancouver, ISO, and other styles

42

Goel, Ashish, and Alden Gross. "Differential item functioning in the cognitive screener used in the Longitudinal Aging Study in India." International Psychogeriatrics 31, no. 9 (February 20, 2019): 1331–41. http://dx.doi.org/10.1017/s1041610218001746.

Full text

Abstract:

ABSTRACTIntroduction:The Longitudinal Aging Study in India (LASI) was initiated to capture data to be comparable to the Health and Retirement Survey (HRS) and hence used study instruments from the HRS. However, a rigorous psychometric evaluation before adaptation of cognitive tests may have indicated bias due to diversities across Indian states such as education, ethnicity, and urbanicity. In the present analysis, we evaluated if items show differential item functioning (DIF) by literacy, urbanicity, and education status.Methods:We calculated proportions for each item and weighted descriptive statistics of demographic characteristics in LASI. Next, we evaluated item-level measurement differences by testing for DIF using the alignment approach implemented using Mplus software.Observation:We found that cognitive items in the LASI interview demonstrate bias by education and literacy, but not urbanicity. Items relating to animal (word) fluency show DIF. The model rates correct identification of the prime minister as the most difficult binary response item whereas the day of the week and numeracy items are rated comparatively easier.Conclusions:Our study would facilitate comparison across education, literacy and urbancity to support analyses of differences in cognitive status. This would help future instrument development efforts by recognizing potentially problematic items in certain subgroups.

APA, Harvard, Vancouver, ISO, and other styles

43

Fallon, Lindsay M., Sadie C. Cathcart, and Austin H. Johnson. "Assessing Differential Item Functioning in a Teacher Self-Assessment of Cultural Responsiveness." Journal of Psychoeducational Assessment 39, no. 7 (June 17, 2021): 816–31. http://dx.doi.org/10.1177/07342829211026464.

Full text

Abstract:

The Assessment of Culturally and Contextually Relevant Supports (ACCReS) was developed in response to the need for well-constructed instruments to measure teachers’ cultural responsiveness and guide decision-making related to professional development needs. The current study sought to evaluate the presence of differential item functioning (DIF) in ACCReS items and the magnitude of DIF, if detected. With a national sample of 999 grade K-12 teachers in the United States, we examined measurement invariance of ACCReS items in relation to responses from (a) racially and ethnically minoritized (REM) youth and white teachers (teacher race), (b) teachers in schools with 0–50% and 51–100% REM youth (student race), and (c) teachers with <1–5 years of teaching experience and teachers with >5 years of experience. Findings suggested that ACCReS items exhibited negligible levels of DIF. The lack of DIF found provides additional evidence for the validity of scores from the ACCReS to assess teachers’ cultural responsiveness. Furthermore, descriptive analyses revealed that teachers were more likely to agree with items pertaining to their own classroom practice than items related to access to adequate training and support. Results inform implications for future educational and measurement research.

APA, Harvard, Vancouver, ISO, and other styles

44

Schafer, John, and Cheryl J. Cherpitel. "Differential item functioning of the CAGE, TWEAK, BMAST and AUDIT by gender and ethnicity." Contemporary Drug Problems 25, no. 2 (June 1998): 399–409. http://dx.doi.org/10.1177/009145099802500207.

Full text

Abstract:

A number of screening instruments for alcohol dependence (AD) or problem drinking are currently in use in primary care settings. This study tests for differential item functioning by gender and ethnicity of the CAGE, TWEAK, brief MAST (BMAST), and AUDIT in 492 emergency room patients with lifetime drinking experience. As the referent standard, the Composite International Diagnostic Interview (CIDI) was used to establish whether the participant was either alcohol dependent or a harmful drinker according to ICD-10 criteria. Differential item functioning (DIF) analyses suggested that 38% of the items on the four screening instruments showed either gender or ethnic DIF.

APA, Harvard, Vancouver, ISO, and other styles

45

Wahyudi, Hasbi. "Pengaplikasian Multiple Indicator Multiple Causes (MIMIC) Model dalam Mendeteksi Differential Item Functioning (DIF) pada Alat Ukur Social Quality of Life." Jurnal Pengukuran Psikologi dan Pendidikan Indonesia (JP3I) 8, no. 1 (November 25, 2019): 25–36. http://dx.doi.org/10.15408/jp3i.v8i1.12851.

Full text

Abstract:

AbstractThis study aims to detect DIF (differential item functioning) on a quality of life measurement tool that measures one aspect, namely social quality of life. Social quality of life contains 24 items developed from the Patient Reported Outcomes Measurement Information System (PROMIS) by a National Institutes of Health (NIH). This measuring tool measures the quality of life in the social function domain of adolescent patients suffering from diseases or chronic medical conditions. Detection of DIF in this study uses a special case approach from CFA, namely CFA with covariate or multiple indicator multiple causes (MIMIC) models. This study involved 322 participants, 117 (36%) male participants and 205 (64%) female participants, with an age range between 13-23 years in Riau Province. Based on the results of the first order CFA on a set of social quality of life items there are 22 valid items. Then the MIMIC model analysis results found that the model is fit with data where the value of RMSEA = 0.048, so it is known two items that contain DIF, namely item 5 (0.135, P = 0.002) "I have a close friend" and item 23 (0.308, P = 0.002 ) "I hope to have lots of friends".Keywords: Social quality of life, MIMIC model, differential item functioningAbstrakPenelitian ini bertujuan untuk mendeteksi DIF (differential item functioning) pada alat ukur quality of life yang mengukur salah satu aspek yaitu social quality of life. Social quality of life berisi 24 item yang dikembangkan dari Patient Reported Outcomes Measurement Information System (PROMIS) oleh sebuah badan National Institutes of Health (NIH). Alaalat ukur ini mengukur kualitas hidup pada domain fungsi sosial pasien remaja yang menderita penyakit atau kondisi medis kronis. Pendeteksian DIF pada penelitian ini menggunakan pendekatan kasus khusus dari CFA, yakni CFA with covariate atau multiple indicator multiple causes (MIMIC) model. Penelitian ini melibatkan 322 partisipan, yakni sebanyak 117 (36%) partisipan laki-laki dan 205 (64%) partisipan perempuan, dengan rentang usia antara 13-23 tahun di Propinsi Riau. Berdasarkan hasil first order CFA pada sekumpulan item-item social quality of life terdapat 22 item yang valid. Kemudian hasil analisis model MIMIC ditemukan bahwa model fit dengan data dimana nilai RMSEA = 0.048, sehingga diketahui dua item yang mengandung DIF, yaitu item 5 (0.135, P = 0.002) “saya memiliki teman dekat” dan item 23 (0.308, P = 0.002) “saya berharap mempunyai banyak teman”.Kata kunci: Social quality of life, MIMIC model, differential item functioning

APA, Harvard, Vancouver, ISO, and other styles

46

Teresi, Jeanne A., Mildred Ramirez, Richard N. Jones, Seung Choi, and Paul K. Crane. "Modifying Measures Based on Differential Item Functioning (DIF) Impact Analyses." Journal of Aging and Health 24, no. 6 (March 15, 2012): 1044–76. http://dx.doi.org/10.1177/0898264312436877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Vaughn, Brandon K., and Qiu Wang. "DIF Trees: Using Classification Trees to Detect Differential Item Functioning." Educational and Psychological Measurement 70, no. 6 (August 27, 2010): 941–52. http://dx.doi.org/10.1177/0013164410379326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Effiom, Anthony Pius. "Test fairness and assessment of differential item functioning of mathematics achievement test for senior secondary students in Cross River state, Nigeria using item response theory." Global Journal of Educational Research 20, no. 1 (August 18, 2021): 55–62. http://dx.doi.org/10.4314/gjedr.v20i1.6.

Full text

Abstract:

This study used Item Response Theory approach to assess Differential Item Functioning (DIF) and detect item bias in Mathematics Achievement Test (MAT). The MAT was administered to 1,751 SS2 students in public secondary schools in Cross River State. Instrumentation research design was used to develop and validate a 50-item instrument. Data were analysed using the maximum likelihood estimation technique of BILOG-MG V3 software. The result of the study revealed that 6% of the total items exhibited differential item functioning between the male and female students. Based on the analysis, the study observed that there was sex bias on some of the test items in the MAT. DIF analysis attempt at eliminating irrelevant factors and sources of bias from any kind for a test to yield valid results is among the best methods of recent. As such, test developers and policymakers are recommended to take into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making. Examination bodies should adopt the Item Response Theory in educational testing and test developers should therefore be mindful of the test items that can cause bias in response pattern between male and female students or any sub-group of consideration. Keywords: Assessment, Differential Item Functioning, Validity, Reliability, Test Fairness, Item Bias, Item Response Theory.

APA, Harvard, Vancouver, ISO, and other styles

49

Wang, Wen-Chung, Ching-Lin Shih, and Guo-Wei Sun. "The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning." Educational and Psychological Measurement 72, no. 4 (January 4, 2012): 687–708. http://dx.doi.org/10.1177/0013164411426157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Paulsen, Justin, Dubravka Svetina, Yanan Feng, and Montserrat Valdivia. "Examining the Impact of Differential Item Functioning on Classification Accuracy in Cognitive Diagnostic Models." Applied Psychological Measurement 44, no. 4 (July 4, 2019): 267–81. http://dx.doi.org/10.1177/0146621619858675.

Full text

Abstract:

Cognitive diagnostic models (CDMs) are of growing interest in educational research because of the models’ ability to provide diagnostic information regarding examinees’ strengths and weaknesses suited to a variety of content areas. An important step to ensure appropriate uses and interpretations from CDMs is to understand the impact of differential item functioning (DIF). While methods of detecting DIF in CDMs have been identified, there is a limited understanding of the extent to which DIF affects classification accuracy. This simulation study provides a reference to practitioners to understand how different magnitudes and types of DIF interact with CDM item types and group distributions and sample sizes to influence attribute- and profile-level classification accuracy. The results suggest that attribute-level classification accuracy is robust to DIF of large magnitudes in most conditions, while profile-level classification accuracy is negatively influenced by the inclusion of DIF. Conditions of unequal group distributions and DIF located on simple structure items had the greatest effect in decreasing classification accuracy. The article closes by considering implications of the results and future directions.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!