Portrait

Hello, my name is Sonia.

I care about understanding humans as thinking, feeling, computational machines and using these insights to build artificial intelligence that better serves a diversity of human intelligence. I study how machines act as social partners, and use the tools of computational cognitive science and human-AI interaction to evaluate and improve their behavior.

My scientific endeavor to understand the substrates of our social, imaginative, and introspective minds pushes against my spiritual and artistic questions about the wonder of our existence. I view science and technology as one channel for these thought forms and seek to understand them more deeply by instantiating them in machines.

I am currently exploring these threads as a PhD student in Computer Science at Harvard University. I am grateful to be advised by Tomer Ullman and Elena Glassman, and to be supported by the NSF Graduate Research Fellowship and the Kempner Institute Graduate Fellowship. Previously, I spent time at ML Alignment & Theory Scholars (MATS), the Allen Institute for Artificial Intelligence (AI2), and Princeton University's Computational Cognitive Science Lab, where I gained mentors and collaborators who continue to shape my research.

Publications

sparkle spotlight talk

Inside you are many wolves: Using cognitive models to reveal value trade-offs in language models

Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

ICLR (2026)

An earlier iteration of this work appeared as spotlight talks at thesparklePragmatic Reasoning in Language Models workshop @ COLM 2025 andsparkleInterpreting Cognition in Deep Learning Models workshop @ NeurIPS 2025

Priors in Time: Missing Inductive Biases for Language Model Interpretability

Ekdeep Singh Lubana*, Can Rager*, Sai Sumedh R. Hindupur*, Valerie Costa, Greta Tuckute, Oam Patel, Sonia K. Murthy, Thomas Fel, Daniel Wurgaft, Eric J. Bigelow, Johnny Lin, Demba Ba, Martin Wattenberg, Fernanda Viegas, Melanie Weber, Aaron Mueller

ICLR (2026)

An earlier iteration of this work appeared at the Interpreting Cognition in Deep Learning Models workshop @ NeurIPS 2025

One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity

Sonia K. Murthy, Tomer Ullman, Jennifer Hu

NAACL (2025)

Comparing the Evaluation and Production of Loophole Behavior in Humans and Large Language Models

Sonia K. Murthy, Kiera Parece, Sophie Bridgers, Peng Qian, Tomer Ullman

EMNLP Findings (2023)

An earlier iteration of this work appeared at the First Workshop on Theory of Mind in Communicating Agents @ ICML 2023

ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jon Borchardt, Daniel S. Weld, Tom Hope, Doug Downey

EMNLP System Demonstrations (2022)

An earlier iteration of this work appeared at the Fifth Widening Natural Language Processing Workshop @ EMNLP 2021

Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task

Sonia K. Murthy, Thomas L. Griffiths, Robert D. Hawkins

Cognition (2022)

Invited talks

February 2026 Brown University, ANCOR seminar series

January 2026 Bay Area AI Safety meetup

November 2025 Google DeepMind, VOICES team

October 2025 Annual Meeting of the Society for Neuroeconomics, "AI in Neuroeconomics" panel

If you would like to chat about research, or are a woman/minority student considering graduate school in Psychology or Computer Science, please feel free to reach out to me at