Non-Markov Decision Processes and Reinforcement Learning

bg-new

Lecturer

Giuseppe De Giacomo, degiacomo@diag.uniroma1.it

Luca Iocchi, iocchi@diag.uniroma1.it

Fabio Patrizi, patrizi@diag.uniroma1.it

Alessandro Ronca, ronca@diag.uniroma1.it

Roberto Cipollone (guest lecturer), cipollone@diag.uniroma1.it

Gabriel Paludo Licks (guest lecturer), licks@diag.uniroma1.it

Elena Umili (guest lecturer), umili@diag.uniroma1.it

Content and organization

We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Level

PhD

Course Duration

20 hours

Course Type

Semester Course

Participation terms

If you are an AIDA Student, complete the following two steps: (a) register at the link https://forms.gle/Nua6Sef2xgmtsbGs8; (b) also enroll in the same course in the AIDA system, in order for this course to be included on your AIDA Course Attendance Certificate. If you are not an AIDA Student, complete only step (a).

Schedule

Mondays and Fridays from 11:00 to 13:00 CET (Slot 1) and from 14:00 to 16:00 CET (Slot 2).

Language

English

Modality (online/in person):

Blended (Online and In Presence)

Notes

We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Host Institution
Sapienza University of Rome

Other short courses

10. 04. 2024 Go

Ethics & STICs

01. 03. 2024 Go

Computer Vision

24. 11. 2023 Go

Human Rights Toolbox

21. 02. 2023 Go

Computer Vision

11. 05. 2022 Go

Geometric learning

05. 04. 2022 Go

Computer Graphics

04. 04. 2022 Go

Bayesian Learning

02. 04. 2022 Go

Computer Graphics

31. 03. 2022 Go

Web of Data

28. 03. 2022 Go

Machine Learning

27. 03. 2022 Go

Machine Learning

02. 03. 2022 Go

Player Modeling

28. 02. 2022 Go

Player Modeling

21. 02. 2022 Go

Affective Computing

21. 02. 2022 Go

Machine Listening

21. 02. 2022 Go

Computer Vision

21. 02. 2022 Go

Computer Vision

21. 02. 2022 Go

Self-Driving Cars

21. 02. 2022 Go

Deep Learning

21. 02. 2022 Go

Deep Learning 2

09. 07. 2021 Go

Self-Driving Cars

09. 07. 2021 Go

Computer Vision

09. 07. 2021 Go

Deep Learning

17. 06. 2021 Go

Deep Learning School

17. 06. 2021 Go

Memory Network

02. 06. 2021 Go

Machine Listening

02. 06. 2021 Go

Affective Computing

02. 06. 2021 Go

Deep Learning 2

01. 06. 2021 Go

Computer Vision