Robotics Engineering PhD Dissertation Defense ~ Sreejani Chatterjee
2:00 p.m. to 4:30 p.m.
Markerless and Model-free Vision-based Robot Control

In this dissertation, we present a novel visual servoing framework to control a robotic manipulator in the configuration space using purely natural visual features. Our goal is to develop methods that can robustly detect and track natural features, or keypoints, on robotic manipulators for vision-based control, especially in scenarios where placing external markers on the robot is not feasible or preferred at runtime. As a first step, we establish a data collection pipeline that uses camera intrinsics, extrinsics and forward kinematics to generate 2D projections of a robot’s joint locations (keypoints) in image space. We then train a real-time keypoint detector using this data. To further reduce the reliance on accurate camera calibration and kinematic models during dataset creation, we introduce an inpainting-based training strategy where we attach ArUco markers along the robot’s body, label their centers as keypoints, and then modify an inpainting method to remove the markers and reconstruct the occluded regions, creating a training pipeline that significantly reduces reliance on explicit robot and camera models for generating labeled data. The detected keypoints are then used as control features in an online Jacobian-estimating IBVS controller to regulate the robot’s configuration directly from image, yielding a markerless, vision-based control pipeline that operates on natural robot features with minimal dependence on explicit robot, camera, or environment models at runtime. To achieve reliable performance under occlusions and noise, a second inpainting model reconstructs obscured regions of the robot in real-time to enable continuous keypoint detection. To enhance consistency, an Unscented Kalman Filter (UKF) refines keypoint estimates over time. Finally, to unify perception and planning, we formulate image-based roadmaps that plan collision-free paths directly in image space. This eliminates the need for joint encoders or explicit robot models for collision free planning and control. We validate the complete planning and control pipeline on a Franka Emika Panda robot. The full system achieves closed-loop 3D control under full visibility using a single monocular camera, without robot or camera models and without depth estimation. Furthermore, it demonstrates robustness to occlusion and disturbances in planar motion. To demonstrate the controller's model-free generality, we also validate the control component on a three-module Soft Origami arm. Overall, this work delivers a practical, model-free pipeline that unifies perception, planning, and control entirely in the image domain.
Advisor: Professor Berk Calli (RBE)
Committee: Professor Constantinos Chamzas (RBE), Professor Nitin Sanket (RBE), and Professor Chun-kit Ngan (DS)