Prof. Neil Heffernan, Advisor, WPI - Computer Science
Prof. Joseph Beck, WPI - Computer Science
Prof. Jacob Whitehill, WPI - Computer Science
Dr. Adam Kalai, Microsoft Research (External Member)
Personalized learning considers that the causal effects of a studied learning intervention may differ for the individual student (e.g., maybe girls do better with video hints while boys do better with text hints). To evaluate a learning intervention inside ASSISTments, we run a randomized control trial (RCT) by randomly assigning students to either a control condition or a treatment condition. Counterfactual inference answers “What if” questions, such as "Would this particular student benefit more if the student were given the video hint instead of the text hint when the student cannot solve a problem?". Counterfactual prediction provides a way to estimate the individual treatment effects and then helps us to assign the students to a learning intervention which leads to a better learning. We applied a version of Michael Jordan’s “Residual Transfer Networks” to counterfactual inference. The model first uses feedforward neural networks to learn a balancing representation of students by minimizing the distance between the distributions of the control and the treated populations and then adopts a residual block to estimate the individual treatment effects based on student representations.
Before students participate these experiments, they have done various number of problems. There are a huge amount of data about student performance history in ASSISTments that are available for learning a representation of students. We propose to use student performance history prior to joining the experiment to learn a representation of these students. Then incorporate these representations into the counterfactual prediction model to verify the efficacy of this approach. Control conditions of these RCTs are usually ‘business-as-usual’ and there are a large amount of data available for the control condition. These historical control data are usually discarded in the analyses for the average treatment effects estimators to be unbiased. In this proposal, one of research questions that we pose is the following: how can we use a large amount of historical control data in addition to RCT data to better estimate individualized treatment effects?