Data Science Master of Science Thesis Presentation by Biao Yin

Wednesday, October 17, 2018
10:00 am to 11:00 am
Floor/Room #: 
Beckett Conference Room

Data Science Master of Science Thesis Presentation by Biao Yin

TITLE:
'Leveraging Randomized Control Trials in a Web-based Learning Platform to Discern Personalization.'

ABSTRACT

It is widely known that there are enormous demands for customization both from researches and applications in this big data era. We present our research on personalization from the perspective of educational data mining in order to help tutoring or analyzing learners in a heterogeneous way when they learn skills online.

One of the ideal ways to find effective tutoring interventions is via randomized control trials which are often considered as the gold standard. Numerous randomized control trials at scale with student-level randomization can be easily setup thanks to the flexibility of current educational platforms built online and the efficient incorporation of real classrooms, virtual platforms and interested researchers who are looking for experimental results from  them.  However,  it  is   often the  case  that  treatment  effects  reported  are  on  all  students  relying  on  the   averaging  belief  of statistical methods, which makes it hard for researchers to gain the correct  or significant results especially with small datasets.We  researched  how  personalized  learning  happened  in  these  educational  random  control   trials  intrinsically  by observing heterogeneous  treatment  effects  based  on  causal   inferences  using  certain  machine  learning  methods. We  adapted  Logistic Regression,  Hierarchical  Linear  Model,  traditional Decision Trees  and Random  Forests, and  two innovative  algorithms, called Causal Trees and Causal Forests, to better analyze data characteristics from the  structures of  randomized  control  trials.  All  the  datasets  we  used  come  from  real   performances  of  students  when  learning mathematical  skills  in  random  control  trials  which   were  built  upon  a  WP-developed,  online  learning  platform  called ASSISTments.

We   not   only   found   heterogeneous   treatment   effects   among   students   who   have    different   prior   mathematical backgrounds   in   certain   strict   educational   experiments    in   the   platform,   but   also   showed   the   differences   of performances   between    traditional   random   forests   and   the   innovative   one   considering   causal  inference   when applying  to  randomized  control  trials  for  personalized  learning.  The  efforts  and   findings  were  summarized  as  two publications  in  2016  and  2017.

Additionally,   we   provided   results   of   techniques   by   subgrouping   the   datasets    based   on   the  administrative structures.  After  considering  intrinsic  group  differences   from  overlapping  students  in  classes,  teachers,  or  even multiple  experiments, we  narrowed   down  the confidence  intervals that  researchers previously  made  for  the  effects, which also  marks the importance of personalization in online educational randomized control trials. This work not only aims at providing instructional  scaffolders  for  problem-solving  tasks  in   computer-based  learning  systems  but  also proposing  effective  models considering causality in  structured social experiments.

Advisor: Neil Heffernan Reader, Jian Zou