DS Ph.D. Proposal Defense | Palawat Busaranuvong | Wednesday, June 25th @ 9:30AM, UH 471 | Multimodal Large Language Models for Automated Wound Infection Assessment from Images: Classification, Captioning and Reasoning

Wednesday, June 25, 2025
9:30 a.m. to 10:30 a.m.
Location
Floor/Room #
471 Conference Room

PhD Dissertation Proposal Defense

Student Name: Palawat Busaranuvong

June 25, 2025 (9:30 AM -11:30 AM)

Location: UH 471 - Unity Hall Conference Room 

Committee:

  • Prof. Emmanuel O. Agu – WPI Computer Science Department. (Advisor)
  • Prof. Fabricio Murai – WPI Computer Science Department.
  • Prof. Bengisu Tulu – WPI The Business School.
  • Prof. Apiwat Ditthapron  – Rajamangala University of Technology Krungthep, Thailand.

Title: Multimodal Large Language Models for Automated Wound Infection Assessment from Images: Classification, Captioning and Reasoning
 

Abstract

Infection of chronic wounds such as Diabetic Foot Ulcers (DFUs), pressure, arterial, venous, surgical, and trauma-related injuries present a major public health challenge. Current diagnosis relies on in-clinic debridement and evaluation by specialists, and blood tests—processes often unavailable at Point of Care (POC) scenarios such as patients' homes. At the POC, non-specialists often rely on visual cues, frequently resulting in errors, limb amputations, unnecessary referrals, and higher healthcare costs. To address these challenges, this dissertation proposes a smartphone-based wound assessment system that (1) classifies infection from wound images, (2) performs clinical reasoning around the wound infection to enhance trust and interpretability, and (3) generates structured clinical notes from nurse-patient conversations.

First, we introduce the Guided Conditional Diffusion Classifier (ConDiff), an image-based model leveraging guided image synthesis, denoising diffusion, and contrastive learning to improve infection classification by comparing synthetic and original image embeddings. Second, we propose SCARWID, a multimodal framework that enhances wound images with captions generated using fine-tuned Vision Language Models (VLMs) with cross-attention to enhance classification robustness. Multimodal Large Language Models (MLLMs) are explored for generating clinical reasoning and improving interpretability across diverse wound types. We utilize a Reinforcement Learning from Verifiable Rewards (RLVR) approach, enabling MLLMs to learn reasoning with minimal wound data and no explicit physician annotations. Finally, structured clinical notes generation transcribes audio of dialogues between nurse-patient; however, identifying "who says what" remains a challenge. Consequently, to improve speaker verification (SV) of such transcriptions, we propose the Multi-Scale Spectral Attention Res2NeXt model that incorporates discrete Fourier transforms and adaptive frequency filtering to boost SV accuracy while reducing computational complexity.

Audience(s)

Department(s):

Data Science
Contact Person
Kelsey Briggs

Phone Number: