Computer Science Department, MS Thesis Presentation , Esther Unekwuojo Agbaji

Friday, April 26, 2019
11:00 am to 12:00 pm
Floor/Room #: 
Beckett Conference Room

Brian Zylich

MS Student

 

WPI – Computer Science

Thursday, April 25, 2019

Time: 12:00 p.m. – 1:00 p.m.

Location: Beckett Conference room/ Fuller Labs

 

Advisor: Prof. Jacob Whitehill

Reader: Prof. Gillian Smith

Abstract:

We explore how to automatically detect specific phrases in audio from noisy, multi-speaker videos using deep neural networks. Specifically, we focus on classroom observation videos that contain a few adult teachers and several small children (< 5 years old). At any point in these videos, multiple people may be talking, shouting, crying, or singing simultaneously. Our goal is to recognize polite speech phrases such as "Good job", "Thank you", "Please", and "You're welcome", as the occurrence of such speech is one of the behavioral markers used in classroom observation coding via the Classroom Assessment Scoring System (CLASS) protocol.

 Commercial speech recognition services such as Google Cloud Speech are impractical because of data privacy concerns. Therefore, we train and test our own custom models using a combination of publicly available classroom videos from YouTube, as well as a private dataset of real classroom observation videos collected by our colleagues at the University of Virginia. We also crowdsource an additional 1152 recordings of polite speech phrases to augment our training dataset. Our contributions are the following: 

(1) we design a crowdsourcing task for efficiently labeling speech events in classroom videos, 

(2) we develop a neural network-based architecture for speech recognition, robust to noise and overlapping speech,

 (3) we explore methods to synthesize new and authentic audio data, both to increase the training set size and reduce the class imbalance. Finally, using our trained polite speech detector,

 (4) we investigate the relationship between polite speech and CLASS scores and enable teachers to visualize their use of polite language.