Computer Science Department, MS Thesis Presentation - Botao Hu " Training Strong Bridge Bidding Agents via PPO with Privileged Information"
9:00 a.m. to 10:00 a.m.
Botao Hu
MS student
WPI – Computer Science Department
Monday, April 27th, 2026
Time: 9:00 AM – 10:00 AM
Location: Fuller Lab 140
Zoom Link: https://wpi.zoom.us/my/botaohu
Advisor: Prof. Qi Zhang
Reader: Prof. Yanhua Li
Abstract:
Bridge bidding is a challenging imperfect-information game requiring partners to communicate through a series of bids. We first reproduce the state-of-the-art results of previous work, confirming that proximal policy optimization (PPO) with fictitious self-play (FSP) yields agents that significantly outperform the rule-based baseline WBridge5.
We then extend their approach by investigating two factors: privileged information and prioritized opponent sampling. We find that incorporating partner or global information into the critic network substantially improves performance in head-to-head matchups among PPO agents, but against the fixed rule-based opponent WBridge5, privileged agents initially underperform relative to locally trained agents.
However, with extended training they surpass local agents, indicating that privileged information can generalize to unseen opponents given sufficient steps. In contrast, prioritized FSP offers no advantage over uniform sampling in any of our settings. Finally, we observe that the bidding strategies learned through self-play are often opaque and incompatible with human conventions, highlighting a key challenge for real-world deployment.