Data Science Master of Science Thesis Presentation by Biao Yin

Wednesday, October 17, 2018
10:00 am to 11:00 am
Floor/Room #: 
Beckett Conference Room

‘Leveraging Randomized Control Trials in a Web-based Learning Platform to Discern Personalization’


There are increasing trends in education research to focus on personalized learning both in research and applications at this big data era. We present our research on personalization from the perspective of educational data mining in order to help tutoring or analyzing learners in a heterogeneous way when they learn mathematical skills online. One of the ideal ways to find effective tutoring interventions is via randomized control trials which are often considered as the gold standard in trials. Numerous randomized control trials at scale with student-level randomization can be easily setup thanks to the flexibility of current tutoring platforms built online and the efficient incorporation among real classrooms, virtual platforms and interested researchers who are looking for experimental results upon them. However, it is often the case that the effects discovered are on all students at a time replying on the averaging belief of statistical methods.

We studied how personalized learning happened in these educational random control trials intrinsically by observing heterogeneous treatment effects based on causal inferences using certain machine learning methods. We fit Logistic Regression, Hierarchical Linear Model, traditional Decision Trees and Random Forests, and two innovative algorithms, called Causal Trees and Causal Forests, to better analyze data characteristics from the structures of randomized control trials. All the datasets we used come from real performances of students when learning mathematical skills in random control trials which were built upon a WPI-developed online tutoring platform called ASSISTments. We not only found heterogeneous treatment effects among students who have different prior mathematical backgrounds in certain strict educational experiments in the platform, but also showed the differences of performances between traditional random forests and the innovative one considering causal inference when applying to randomized control trials for personalized learning. The efforts and findings were summarized as two publications in 2016 and 2017. Additionally, in order to further mark the importance of personalization, we provided results narrowing down confidence intervals of the original effects when considering administrative structures across experiments. This thesis not only aims at providing instructional scaffolders for problem-solving tasks in the online learning platform but also proposing effective models considering causality in structured social experiments.

Advisor: Professor Neil Heffernan
Reader: Associate Professor Jian Zou