Computerized Adaptive Testing and Multistage Testing with R
Duanli Yan & Alina A. von Davier
Full day short course
The goal of this workshop is to provide a practical (and brief) overview of the theory on computerized adaptive testing (CAT) and multistage testing (MST), and illustrate the methodologies and applications using R open source language and several data examples. The implementations rely on the R packages catR and mstR that have been already or are being developed and include some of the newest research algorithms developed by the authors.
This workshop will cover several topics: the basics of R, theoretical overview of CAT and MST, CAT and MST designs, assembly methodologies, catR and mstR packages, simulations, and applications.
The intended audience for the workshop is undergraduate/graduate students, faculty, researchers, practitioners at testing institutions, and anyone in psychometrics, measurement, education, psychology, and other fields who is interested in computerized adaptive and multistage testing, especially in practical implementations of simulation using R.
Summary
Computerized adaptive testing (CAT) has become a very popular method of administering questionnaires, collecting data and on-the-fly scoring (van der Linden & Glas, 2010; Wainer, 2015). It has been used in many large-scale assessments over last decades and is currently an important field of research in psychometrics. Multistage testing (MST), on the other hand, got increased popularity in recent years (Yan, von Davier, & Lewis, 2014).
Both approaches rely on the notion of adaptive testing: items are administered sequentially and selected optimally according to the current responses to the administered items. In other words, the selection of the next items to administer depends on some current, ad-interim estimation of ability that is based on previously administered items. The conceptual difference between CAT and MST is that with CAT, items are selected one after each other (among a large pool of available items) and the ability of the test taker is estimated after the administration of each item. In MST, however, items are included in predefined modules and the selection of the subsequent modules is based on the performance on the previously administered modules, not on the single items (Magis, Yan, & von Davier, 2017).
Both methods have advantages and drawbacks with respect to each other and to linear testing. However, their practical usefulness relies mostly on accurate implementations of algorithms to perform test assembly, optimal item or module selection, (IRT) scoring, stopping rules and reporting.
In CAT, several commercial software (CATSim, Adaptest…) exist and some open-source solutions for simulation studies exist, most of them implemented in the R software, among others the packages catR (Magis & Barrada, in press; Magis & Raîche, 2012) and mirtCAT (Chalmers, 2015) and the R-based software Firestar (Choi, 2009). In MST, MSTGen (Han, 2013) exists. Very recently, the R package mstR was developed to provide a tool for simulations in the MST context, similarly to the catR package for CAT framework.
The purpose of this workshop is threefold: a) to provide a brief overview of CAT and MST approaches and outline their specificities, advantages, and drawbacks with respect to linear testing, as well as their technical challenges; b) to present the R packages catR and mstR, their options and performances, in a simulation study-oriented perspective; c) to run several examples of CAT and MST with both packages as illustrations.
The workshop will be a mix of theoretical and practical content. Demonstrations of catR and mstR will be used to illustrate the theoretical framework. Participants are encouraged to bring their laptops with R being pre-installed (and possibly also the R packages catR and mstR, though this can be fixed at the beginning of the workshop). Although R is available under Windows, Linux/ UNIX and MacOS platforms, demos will be run under Windows 7. Hands-out and R scripts will be made available for the participants.
References
- Chalmers, P. (2015). mirtCAT: Computerized adaptive testing with multidimensional item response theory. R package version 0.6.1. http://CRAN.R-project.org/package=mirtCAT
- Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644-645.
- Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37, 666-668.
- Magis, D., & Barrada, J. R. (in press). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software.
- Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, 1-31.
- Magis, D., Yan, D., & von Davier, A.A. (2017). Computerized Adaptive and Multistage Testing with R. New York: Springer.
- van der Linden, W. J., & Glas, C. A. W. (2010). Elements of Computerized Adaptive Testing. New York: Springer.
- Wainer, H. (2015). Computerized Adaptive Testing: A Primer (2nd Ed). Routledge.
- Yan, D., von Davier, A.A., & Lewis, C. (2014). Computerized Multistage Testing: Theory and Applications. London: Chapman and Hall.
About the instructors
Duanli Yan
Dr. Duanli Yan is Director of Data Analysis and Computational Research for Automated Scoring group in the Research and Development division at Educational Testing Service (ETS). She is also an Adjunct Professor at Fordham University. She was a Psychometrician for several operational programs and led the EXADEP™ test and the TOEIC® Institutional programs, a Development Scientist for innovative research applications. She was the recipient of 2011 ETS Presidential Award, 2013 NCME Brenda Loyd award, and 2015 IACAT Early Career Award, and 2016 AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. She is a co-editor for volume Computerized Multistage Testing: Theory and Applications and Handbook of Automated Scoring: Theory into Practice. She is also a co-author for book Bayesian Networks in Educational Assessment and Computerized Adaptive and Multistage Testing with R. She has presented training sessions and workshops at the National Council of Measurement in Education (NCME), International Association for Computerized Adaptive Testing (IACAT), and International Psychometrics Society (IMPS).
Alina A. von Davier
Alina A von Davier is a psychometrician and researcher in computational psychometrics, machine learning, and education. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. She is the Chief of Assessment at Duolingo, where she leads the Duolingo English Test research and development area. She is also the Founder and CEO of EdAstra Tech, a service-oriented EdTech company. In 2022, she joined the University of Oxford as an Honorary Research Fellow,and Carnegie Mellon University as a Senior Research Fellow.