Abstract
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions that affect investigations of differential item functioning (DIF) such as choice of method, sample size, effect size criteria, conditioning variable, purification, DIF amplification, DIF cancellation, and research designs for evaluating DIF. Our review highlights the necessity of matching the DIF procedure to the nature of the data analysed, the need to include effect size criteria, the need to consider the direction and balance of items flagged for DIF, and the need to use replication to reduce Type I errors whenever possible. Directions for future research and practice in using DIF to enhance the validity of test scores are provided.
Keywords
Related Publications
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor mult...
Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.
This study investigated the utility of confirmatory factor analysis (CFA) and item response theory (IRT) models for testing the comparability of psychological measurements. Both...
Modeling Differential Item Functioning Using a Generalization of the Multiple-Group Bifactor Model
The authors present a generalization of the multiple-group bifactor model that extends the classical bifactor model for categorical outcomes by relaxing the typical assumption o...
The Relationship between Factorial Composition of Test Items and Measures of Test Reliability
For continuous distributions associated with dichotomous item scores, the proportion of common-factor variance in the test, H 2 , may be expressed as a function of intercorrelat...
Constructing validity: Basic issues in objective scale development.
A primary goal of scale development is to create a valid measure of an underlying construct. We discuss theoretical principles, practical issues, and pragmatic decisions to help...
Publication Info
- Year
- 2013
- Type
- article
- Volume
- 19
- Issue
- 2-3
- Pages
- 170-187
- Citations
- 64
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/13803611.2013.767621