Non-Markov Decision Processes and Reinforcement Learning

Lecturer

Giuseppe De Giacomo, degiacomo@diag.uniroma1.it

Luca Iocchi, iocchi@diag.uniroma1.it

Fabio Patrizi, patrizi@diag.uniroma1.it

Alessandro Ronca, ronca@diag.uniroma1.it

Roberto Cipollone (guest lecturer), cipollone@diag.uniroma1.it

Gabriel Paludo Licks (guest lecturer), licks@diag.uniroma1.it

Elena Umili (guest lecturer), umili@diag.uniroma1.it

Content and organization

We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Level

PhD

Course Duration

20 hours

Course Type

Semester Course

Participation terms

If you are an AIDA Student, complete the following two steps: (a) register at the link https://forms.gle/Nua6Sef2xgmtsbGs8; (b) also enroll in the same course in the AIDA system, in order for this course to be included on your AIDA Course Attendance Certificate. If you are not an AIDA Student, complete only step (a).

Schedule

Mondays and Fridays from 11:00 to 13:00 CET (Slot 1) and from 14:00 to 16:00 CET (Slot 2).

Language

English

Modality (online/in person):

Blended (Online and In Presence)

Notes

We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Host Institution
Sapienza University of Rome

Other short courses

10. 04. 2024 Go

Ethics & STICs

01. 03. 2024 Go

Computer Vision

24. 11. 2023 Go

Human Rights Toolbox

08. 09. 2023 Go