VOILA! Agents of Chaos and the Mission of Mechanistic Interpretability by Dr. Natalie Shapira

VOILA! AI and Agents

VOILA! AI and Agents
Title

VOILA! Agents of Chaos and the Mission of Mechanistic Interpretability

Lecturer

Dr. Natalie Shapira (Khoury College of Computer Sciences, Northeastern University), https://www.khoury.northeastern.edu/people/natalie-shapira/

Content and organization

In this talk Recent advances in AI have led to increasingly autonomous systems exhibiting what is often referred to as agentic behavior, capabilities that include goal-directed planning, adaptation of strategies, decision making, and interaction with complex environments. While such capabilities are promising, they also introduce potential risks, including misalignment and unintended emergent behaviors that are difficult to anticipate or control.

In this talk, I highlight how agentic models can exhibit failure modes that resemble “agents of chaos,” producing unpredictable, misaligned, or strategically opaque behavior.

I argue that such phenomena cannot be adequately addressed through behavioral evaluation alone, nor through existing training paradigms such as reinforcement learning from human feedback (RLHF). Instead, we require mechanistic accounts of how internal representations and computational circuits give rise to agentic behavior. I will survey recent progress in mechanistic interpretability, with a focus on efforts to reverse-engineer learned circuits associated with capabilities such as theory of mind, to develop predictive and causal models of model behavior.

I conclude by asking a broader question: to what extent is mechanistic interpretability necessary to tame agentic systems, and is it sufficient?

About the speaker Natalie Shapira is a postdoctoral researcher at Northeastern Khoury College of Computer Sciences, Interpretation of Deep Networks lab. In her PhD, she combined natural language processing, deep learning and clinical psychology. With over ten years in the industry, she most recently worked as a researcher at Amazon Science. Before that, she held a research position at IBM’s research labs, where she served on the Patent Committee. Natalie also has entrepreneurial experience as a co-founder and CSO in projects funded by the Israel Innovation Authority.

Course Duration

1.5

Course Type

Short Course

Participation terms

Attendance is free and open to everyone interested. Please register via the course link, and you will receive the Zoom meeting details one day before the seminar.

Language

English (with subtitles)

Modality (online/in person):

Online

Notes

For AIDA students only : In addition to registering via the course link, please click on the “Enroll in this course” button located at the bottom of the page to ensure that the course appears on your AIDA Certificate of Course Attendance upon successful completion.

Host Institution
Université Côte d'Azur

Other short courses

10. 12. 2025 Go

Ethics & STICs

11. 03. 2025 Go

Deep Learning

13. 02. 2025 Go

Ethics and AI

13. 02. 2025 Go

Computer Vision

19. 01. 2025 Go

Ethics & STICs

10. 04. 2024 Go

Ethics & STICs

01. 03. 2024 Go

Computer Vision

24. 11. 2023 Go

Human Rights Toolbox

21. 02. 2023 Go

Computer Vision

11. 05. 2022 Go

Geometric learning

05. 04. 2022 Go

Computer Graphics

04. 04. 2022 Go

Bayesian Learning

02. 04. 2022 Go

Computer Graphics

31. 03. 2022 Go

Web of Data

28. 03. 2022 Go

Machine Learning

27. 03. 2022 Go

Machine Learning

02. 03. 2022 Go

Player Modeling

28. 02. 2022 Go

Player Modeling

21. 02. 2022 Go

Affective Computing

21. 02. 2022 Go

Machine Listening

21. 02. 2022 Go

Computer Vision

21. 02. 2022 Go

Computer Vision

21. 02. 2022 Go

Self-Driving Cars

21. 02. 2022 Go

Deep Learning

21. 02. 2022 Go

Deep Learning 2

09. 07. 2021 Go

Self-Driving Cars

09. 07. 2021 Go

Computer Vision

09. 07. 2021 Go

Deep Learning

17. 06. 2021 Go

Deep Learning School

17. 06. 2021 Go

Memory Network

02. 06. 2021 Go

Machine Listening

02. 06. 2021 Go

Affective Computing

02. 06. 2021 Go

Deep Learning 2

01. 06. 2021 Go

Computer Vision