Pub. online:2 Feb 2023Type:Statistical Data ScienceOpen Access
Journal:Journal of Data Science
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 391–411
Abstract
Traditional methods for evaluating a potential treatment have focused on the average treatment effect. However, there exist situations where individuals can experience significantly heterogeneous responses to a treatment. In these situations, one needs to account for the differences among individuals when estimating the treatment effect. Li et al. (2022) proposed a method based on random forest of interaction trees (RFIT) for a binary or categorical treatment variable, while incorporating the propensity score in the construction of random forest. Motivated by the need to evaluate the effect of tutoring sessions at a Math and Stat Learning Center (MSLC), we extend their approach to an ordinal treatment variable. Our approach improves upon RFIT for multiple treatments by incorporating the ordered structure of the treatment variable into the tree growing process. To illustrate the effectiveness of our proposed method, we conduct simulation studies where the results show that our proposed method has a lower mean squared error and higher optimal treatment classification, and is able to identify the most important variables that impact the treatment effect. We then apply the proposed method to estimate how the number of visits to the MSLC impacts an individual student’s probability of passing an introductory statistics course. Our results show that every student is recommended to go to the MSLC at least once and some can drastically improve their chance of passing the course by going the optimal number of times suggested by our analysis.
This study investigates whether Support Vector Machine (SVM) can be used to predict the problem solving performance of students in the computerbased learning environment. The SVM models using RBF, linear, polynomial and sigmoid kernels were developed to estimate the probability for middle school students to get mathematics problems correct at their first attempt without using hints available in the computer-based learning environment based on their problem solving performance observed in the past. The SVM models showed better predictions than the standard Bayesian Knowledge Tracing (BKT) model, one of the most widely used prediction models in educational data mining research, in terms of Area Under the receiver operating characteristic Curve (AUC). Four SVM models got AUC values from 0.73 to 0.77, which is approximately 29% improvement, compared to the standard BKT model whose AUC was 0.58.