From Collecting Log-data to Analyzing Process Indicators with logLime R Package
Tomasz Żółtak
Full day short course
The growing interest in using log-data as auxiliary information in psychometric modeling is not a surprise as these data are a rich source of evidence on response processes that are otherwise impossible to measure. Indicators built on the basis of log-data, such as response times, sequences of actions, cursor moves characteristics or time spent by cursor over different “areas of interest” can serve as both outcome or independent variable, providing additional insight into modeled constructs. Although interest in this type of data originated from the international large scale assessments in education, most notably PISA and PIAAC, log-data can be applied to any quantitative study, be it measuring personality traits, public opinion surveys or gauging cognitive ability. Despite the potential brought by log-data, software solutions enabling researchers to both collect log-data and construct process indicators are still in short supply.
During this full day course participants will learn how to use the software solution consisting of the Lime Survey online survey platform, logdataLimeSurvey Java Script applet and logLime R package to perform the entire process of 1) collecting raw log-data describing respondent/test taker actions, 2) cleaning and transforming raw log-data into different types of informative, analytically useful process indicators, 3) visualizing user actions using log-data, 4) preparing collected log-data to be used with other software.
The course will begin with a short presentation and discussion about what log-data is, how it relates to a broader concept of paradata, how it can be collected and different representations in which it can be stored. Also, I will briefly describe the most widely used process indicators that can be computed using log-data and their applications in psychometric modeling. Following this introduction, I will demonstrate how logdataLimeSurvey Java Script applet can be easily used to collect log-data during the surveys/test taken on the LimeSurvey platform and how the collected data may be exported.
During the following part of the course participants will learn how to use the logLime R package to:
- Import log-data into R
- Diagnose problems and perform cleaning of the log-data
- Computing process indicators including: 1) response editing, 2) hovering indices, 3) indicators based on item-level response times, 4) cursor moves indices, 5) survey navigation indices.
- Visualize respondent behaviors by 1) drawing heatmaps, 2) drawing cursor traces and clicks, 3) animating cursor traces and clicks (using packages ggplot2 and gganimate)
- Prepare data for analysis using other software, including LogFSM, ProcData or mousetrap R packages.
While discussing these applications, participants will discover different types of recorded events and learn how they are related to respondents’ actions. Moreover, we will explore different ways in which log-data can be aggregated to obtain datasets describing either sequences of events or states at the predefined time points. Finally, I will discuss possibilities to use the logLime package with data collected from other sources than the LimeSurvey platform and prospects for the future development of the package.
Intended audience is especially researchers, students and practitioners interested in exploring opportunities to collect and use log-data in their own studies. Moreover, the course will be useful for everyone in psychometrics, measurement, education, psychology, and other fields who are interested in using log-data and process indicators in modeling respondents’ (test takers’) behavior.
Prior to the course, I will provide instruction on downloading and installing R and packages required for the workshop. Some prior experience with R is useful but not necessary. Participants will be provided with the log-data to be analyzed during the workshop. Participants do not need to have access to the LimeSurvey platform. Please note, that the course will not cover the process of creating and administering surveys on the LimeSurvey platform.
About the Instructor
Tomasz Żółtak
Dr. Tomasz Żółtak is a researcher scientist at the Institute of Philosophy and Sociology at the Polish Academy of Sciences. His research interests concentrate on applied statistics and research methodology mostly in the fields of educational research and political science.
As a member of the Computational Social Science Department he is currently working on the project exploring response styles in surveys, which brought him to the development of the logLime R package and logdataLimeSurvey JS applet enabling log-data collection and analysis using open software.
For many years he worked at the Educational Research Institute in Warsaw he researched on value-added indicators of school effectiveness and tracking educational and professional careers of graduates. He also has experience in teaching (courses on causal inference for PhD students at the University of Warsaw), implementing advanced psychometric methods in business solutions (collaboration with Polish companies Diagmatic and Talent Bridge), as well as in software development (author and contributor to several R packages).