WPI – Computer Science
Thursday, May 2nd, 2019
Time: 4:30 p.m. – 5:30 p.m.
Location: Atwater Kent 232
Advisor: Prof. Jake Whitehill
Co-Advisor: Prof. Xiangnan Kong
Using the DIHARD competition development dataset, the VoxCeleb dataset, and the LibriSpeech ASR corpus, we investigated 3 components of the diarization process:
(1) We compared the performance of deep embedding models using different neural architectures (VGG, LSTM, Transformer) and trained with different datasets.
(2) Using an acoustical simulator, we evaluated a method of data augmentation by simulating different distances between the speakers and the recording microphone.
(3) We explored an alternative to the baseline approach whereby embeddings are computed independently within each time window. Instead, we developed a novel sequential approach that jointly analyzes all the utterances within each audio clip. We present some evidence that this approach can, under certain conditions, outperform the independent embedding approach.