Computer Science PhD Research Qualifier, Zeqian Li " Independent vs. Sequential Embeddings for Speaker Verification & Diarization"

Thursday, May 02, 2019
4:30 pm to 5:30 pm
Floor/Room #: 

Zeqian Li

PhD Student

WPI – Computer Science


Thursday, May 2nd, 2019

Time: 4:30 p.m. – 5:30 p.m.

Location: Atwater Kent 232


Advisor: Prof. Jake Whitehill

Co-Advisor: Prof. Xiangnan Kong



Using the DIHARD competition development dataset, the VoxCeleb dataset, and the LibriSpeech ASR corpus, we investigated 3 components of the diarization process:

 (1) We compared the performance of deep embedding models using different neural architectures (VGG, LSTM, Transformer) and trained with different datasets.

 (2) Using an acoustical simulator,  we evaluated a method of data augmentation by simulating different distances between the speakers and the recording microphone.

 (3) We explored an alternative to the baseline approach whereby embeddings are computed independently within each time window. Instead, we developed a novel sequential approach that jointly analyzes all the utterances within each audio clip. We present some evidence that this approach can, under certain conditions, outperform the independent embedding approach.