BEGIN:VCALENDAR
CALSCALE:GREGORIAN
VERSION:2.0
METHOD:PUBLISH
PRODID:-//Drupal iCal API//EN
X-WR-TIMEZONE:America/New_York
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
DTSTART:20070311T020000
TZNAME:EDT
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
DTSTART:20071104T020000
TZNAME:EST
TZOFFSETTO:-0500
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
SEQUENCE:1
X-APPLE-TRAVEL-ADVISORY-BEHAVIOR:AUTOMATIC
234816
20260416T160515Z
DTSTART;TZID=America/New_York:20260427T090000
DTEND;TZID=America/New_York:2
 0260427T100000
URL;TYPE=URI:https://www.wpi.edu/news/calendar/events/compu
 ter-science-department-ms-thesis-presentation-botao-hu-training-strong-bri
 dge-bidding-agents
Computer Science Department, MS Thesis Presentation  - Botao Hu &amp;quot; Training Strong Bridge Bidding Agents via PPO with Privileged Information&amp;quot;
Botao Hu\nMS student\nWPI – Computer Science Department\nMonday, April 27th, 2026\nTime:
  9:00 AM – 10:00 AM\nLocation: Fuller Lab 140\n\nZoom Link:https://wpi.z
 oom.us/my/botaohu\nAdvisor: Prof. Qi Zhang\nReader: Prof. Yanhua Li\nAbstr
 act:\nBridge bidding is a challenging imperfect-information game requiring
  partners to communicate through a series of bids. We first reproduce the 
 state-of-the-art results of previous work, confirming that proximal policy
  optimization (PPO) with fictitious self-play (FSP) yields agents that sig
 nificantly outperform the rule-based baseline WBridge5.\nWe then extend th
 eir approach by investigating two factors: privileged information and prio
 ritized opponent sampling. We find that incorporating partner or global in
 formation into the critic network substantially improves performance in he
 ad-to-head matchups among PPO agents, but against the fixed rule-based opp
 onent WBridge5, privileged agents initially underperform relative to local
 ly trained agents.\nHowever, with extended training they surpass local age
 nts, indicating that privileged information can generalize to unseen oppon
 ents given sufficient steps. In contrast, prioritized FSP offers no advant
 age over uniform sampling in any of our settings. Finally, we observe that
  the bidding strategies learned through self-play are often opaque and inc
 ompatible with human conventions, highlighting a key challenge for real-wo
 rld deployment.\n
END:VEVENT
END:VCALENDAR