Computer Science Department, MS Thesis Presentation - Botao Hu " Training Strong Bridge Bidding Agents via PPO with Privileged Information"

Monday, April 27, 2026
9:00 a.m. to 10:00 a.m.

Botao Hu

MS student

WPI – Computer Science Department 

Monday,  April 27th,  2026

Time: 9:00 AM – 10:00 AM

Location:  Fuller Lab 140

 

Zoom Link: https://wpi.zoom.us/my/botaohu 

Advisor:  Prof. Qi Zhang

Reader:   Prof. Yanhua Li

Abstract: 

Bridge bidding is a challenging imperfect-information game requiring partners to communicate through a series of bids. We first reproduce the state-of-the-art results of previous work, confirming that proximal policy optimization (PPO) with fictitious self-play (FSP) yields agents that significantly outperform the rule-based baseline WBridge5. 

We then extend their approach by investigating two factors: privileged information and prioritized opponent sampling. We find that incorporating partner or global information into the critic network substantially improves performance in head-to-head matchups among PPO agents, but against the fixed rule-based opponent WBridge5, privileged agents initially underperform relative to locally trained agents. 

However, with extended training they surpass local agents, indicating that privileged information can generalize to unseen opponents given sufficient steps. In contrast, prioritized FSP offers no advantage over uniform sampling in any of our settings. Finally, we observe that the bidding strategies learned through self-play are often opaque and incompatible with human conventions, highlighting a key challenge for real-world deployment.