Data Science Master of Science Thesis Presentation by Biao Yin
'Leveraging Randomized Control Trials in a Web-based Learning Platform to Discern Personalization.'
It is widely known that there are enormous demands for customization both from researches and applications in this big data era. We present our research on personalization from the perspective of educational data mining in order to help tutoring or analyzing learners in a heterogeneous way when they learn skills online.
One of the ideal ways to find effective tutoring interventions is via randomized control trials which are often considered as the gold standard. Numerous randomized control trials at scale with student-level randomization can be easily setup thanks to the flexibility of current educational platforms built online and the efficient incorporation of real classrooms, virtual platforms and interested researchers who are looking for experimental results from them. However, it is often the case that treatment effects reported are on all students relying on the averaging belief of statistical methods, which makes it hard for researchers to gain the correct or significant results especially with small datasets.We researched how personalized learning happened in these educational random control trials intrinsically by observing heterogeneous treatment effects based on causal inferences using certain machine learning methods. We adapted Logistic Regression, Hierarchical Linear Model, traditional Decision Trees and Random Forests, and two innovative algorithms, called Causal Trees and Causal Forests, to better analyze data characteristics from the structures of randomized control trials. All the datasets we used come from real performances of students when learning mathematical skills in random control trials which were built upon a WP-developed, online learning platform called ASSISTments.
We not only found heterogeneous treatment effects among students who have different prior mathematical backgrounds in certain strict educational experiments in the platform, but also showed the differences of performances between traditional random forests and the innovative one considering causal inference when applying to randomized control trials for personalized learning. The efforts and findings were summarized as two publications in 2016 and 2017.
Additionally, we provided results of techniques by subgrouping the datasets based on the administrative structures. After considering intrinsic group differences from overlapping students in classes, teachers, or even multiple experiments, we narrowed down the confidence intervals that researchers previously made for the effects, which also marks the importance of personalization in online educational randomized control trials. This work not only aims at providing instructional scaffolders for problem-solving tasks in computer-based learning systems but also proposing effective models considering causality in structured social experiments.
Advisor: Neil Heffernan Reader, Jian Zou