RBE PhD DISSERTATION PROPOSAL
Optimal control and reinforcement learning for stochastic systems under temporal logic specifications
Friday, July 30, 2021
10:00 AM - 11:00 AM
Virtual | Zoom: https://wpi.zoom.us/j/92742209854?from=addon
Abstract: We commonly encounter stochastic dynamic systems in various application domains, such as robotics, defense operations, and other cyber-physical systems. Different applications differ in high-level specifications for systems; that is, stochastic dynamic systems need to satisfy requirements compelled by different high-level specifications. These specifications include liveness (something good will always eventually happen), safety (nothing bad will happen), fairness (all constituent processes will involve and non will starve). Given such different properties, how to design control policies for stochastic dynamic systems to satisfy these properties? Additionally, how to synthesize these policies efficiently?
This thesis proposal presents a comprehensive probabilistic planning framework for stochastic dynamic systems given high-level specifications. The first contribution is to propose an efficient planning method comprising two components: a novel reinforcement learning algorithm and a policy learning framework. The reinforcement learning method adopts an on-policy sampling fashion to efficiently learn a near-optimal value function with the corresponding policy satisfying a temporal objective expressed in Probabilistic Computation Tree Logic (PCTL). The policy learning framework incorporates an algorithm that reveals topological information in formal specifications in Linear Temporal Logic (LTL) and leverage that knowledge to guide policy learning. The second contribution is to expand probabilistic planning with temporal objectives in LTL into an adversarial interaction. We introduce a class of hypergame models that capture such adversarial interaction in the presence of asymmetric, incomplete information and establish a solution concept for this class of hypergames. Last, we extend the existing Metric Interval Temporal Logic (MITL) with a distribution eventuality operator. The extended MITL allows us to jointly reason about the probabilistic occurrence of external events and intent system behavior. We propose a systematic approach to translate such extended MITL into a formalism. Employing such formalism, we develop a near-optimal planning method with a bounded error guarantee.
PhD Committee Members:
Prof. Jie Fu (Advisor, Committee Chair), RBE, WPI
Prof. Andrew Clark, ECE, WPI
Prof. Raghvendra Cowlagi, AE, WPI
Prof. Carlo Pinciroli, RBE/CS, WPI