WPI - Computer Science Department, PhD Proposal Defense, Apiwat Ditthapron " Energy-efficient and Privacy-preserving Neural Network Models for Paralinguistic Health Assessment from Speech"

Department(s):

Computer Science

 

Apiwat Ditthapron

PhD Candidate  

WPI – Computer Science Department  

 

Monday, March 20, 2023 

Time: 4:30 PM – 6:30 PM

Location:  Beckett Conference Room, Fuller Labs

Zoom linkhttps://wpi.zoom.us/j/95820002538

 

Committee Members

Advisor: Prof. Emmanuel Agu, Computer Science

Co-advisor: Prof. Adam Lammert, Biomedical Engineering 

Internal member: Prof. Elke Rundensteiner, Computer Science

External member: Dr. Thomas Quatieri, MIT Lincoln Laboratory

 

Abstract

Speech is an important biomarker in the clinical assessment of neurological disorders, such as Traumatic Brain Injury (TBI), as well as mental health conditions such as depression. Speech production and communication difficulties are common manifestations of disability after TBI (2% of the population), whereas speech patterns such as low pitch and monotonous speech are effective indicators of depression (8.4% of the population).

Passive speech monitoring between hospital visits using mobile devices can reduce the healthcare burden and rehospitalization because it requires minimal subject participation to detect various disorders with an accuracy competitive with conventional approaches that require active subject engagement. We have previously proposed and published research with encouraging results, including an Artificial Neural Networks (ANN) method for passive TBI assessment from speech that uses a Cascaded Gated Recurrent Unit (cGRU) to classify parametrized Sinc (pSinc) features extracted from smartphone-gathered conversational speech. However, several critical problems presented by mobile environments and the use of smartphones to continuously collect and process speech remain unsolved. These include energy efficiency, additional noise and other speakers in mobile environments, and maintaining speaker privacy in public spaces.

To address these issues in passive speech assessment using smartphones, three novel solutions are proposed in this dissertation:

  1. Energy efficiency: An energy-efficient masking kernel that improves battery consumption by using gradient descent to determine the optimal length and sampling rate of speech recordings.
  2. Noise and other speakers: A Gaussian-based contrastive learning method to detect and mitigate environmental factors such as noise and cross-talk from non-target speakers in public spaces.
  3. Privacy preservation: A distilled ANN with adversarial learning, that extracts privacy-preserving speech features on smartphones and formulates speaker and speech recognition as adversarial tasks.