Speaker
Giovanni Varricchione
Utrecht University
Talks at this conference:
Wednesday, 14:25, J336 |
Pure-Past Action Masking: constraining Reinforcement Learning via pure-past LTL |
Authors: Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Giuseppe De Giacomo, Brian Logan and Giuseppe Perelli We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for reinforcement learning (RL). In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL) [3]. PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the environment. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. Via [4], we prove formally that PPAM is as expressive as shields [1], another approach to enforce non-Markovian constraints in RL. Then, thanks to a result from [2], we show that PPAM incur a single exponential blowup, instead of the double exponential one of shields. Bibliography
|