Skip to main content

WPI - Computer Science Department , PhD Defense , Zeqian li , " Compositional Embeddings and Their Applications in Speaker Diarization "

Friday, July 01, 2022 to Saturday, July 02, 2022
1:00 pm

 Zeqian Li 

PhD Candidate 

WPI - Computer Science 



Compositional Embeddings and Their Applications in Speaker Diarization



Friday 07/01/2022, 1:00 pm ET


Zoom link:



Committee members:


Prof. Jacob Whitehill,  Advisor, WPI - Computer Science

Prof. Xiangnan Kong, WPI -Computer Science

Dr. Mike Mozer, Google

Dr. Jangwon Kim, Amazon


Our work is focused on the task of speaker diarization and our main contribution includes: 1) we make a comparison of loss functions used in the training of speaker embedding models and find out the best ones for different scenarios; 2) we propose a method called compositional embedding that enables the embeddings to represent a set of classes instead just one; this approach is integrated into a speaker diarization pipeline and achieves the state-of-the-art result on a public benchmark, AMI-Headset Mix, with 22.19% DER compared with previous 23.82%; 3) we introduce the problem of compositional clustering that both partitions data into clusters and models their compositional relationships; the proposed method achieves 96.2% in a dataset containing 15,000 samples created from LibriSpeech, compared with the best baseline's 88.4%, and achieves 96.6% in a dataset created from OmniGlot compared with the best baseline's 88.7%, using a metric called CRI (compositional rand index) that we designed for this new problem; 4) we finally introduce a method to utilize the speaker embedding model trained in compositional embedding manner and make improvements to a enrollment-based speaker diarization system.