Hello, my name is Sonia.
I care about understanding humans as thinking, feeling, computational machines and using these insights to build artificial intelligence that better serves a diversity of human intelligence. I study how machines act as social partners, and use the tools of computational cognitive science and human-AI interaction to evaluate and improve their behavior.
My scientific endeavor to understand the substrates of our social, imaginative, and introspective minds pushes against my spiritual and artistic questions about the wonder of our existence. I view science and technology as one channel for these thought forms and seek to understand them more deeply by instantiating them in machines.
I am currently exploring these threads as a PhD student in Computer Science at Harvard University. I am grateful to be advised by Tomer Ullman and Elena Glassman, and to be supported by the NSF Graduate Research Fellowship and the Kempner Institute Graduate Fellowship. Previously, I spent time at ML Alignment & Theory Scholars (MATS), the Allen Institute for Artificial Intelligence (AI2), and Princeton University's Computational Cognitive Science Lab, where I gained mentors and collaborators who continue to shape my research.
Publications
spotlight talk
Inside you are many wolves: Using cognitive models to reveal value trade-offs in language models
ICLR (2026)
An earlier iteration of this work appeared as spotlight talks at thePragmatic Reasoning in Language Models workshop @ COLM 2025 and
Interpreting Cognition in Deep Learning Models workshop @ NeurIPS 2025
Priors in Time: Missing Inductive Biases for Language Model Interpretability
ICLR (2026)
An earlier iteration of this work appeared at the Interpreting Cognition in Deep Learning Models workshop @ NeurIPS 2025
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
NAACL (2025)
Comparing the Evaluation and Production of Loophole Behavior in Humans and Large Language Models
EMNLP Findings (2023)
An earlier iteration of this work appeared at the First Workshop on Theory of Mind in Communicating Agents @ ICML 2023
ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts
EMNLP System Demonstrations (2022)
An earlier iteration of this work appeared at the Fifth Widening Natural Language Processing Workshop @ EMNLP 2021
Invited talks
Brown University, ANCOR seminar series
Bay Area AI Safety meetup
Google DeepMind, VOICES team
Annual Meeting of the Society for Neuroeconomics, "AI in Neuroeconomics" panel
If you would like to chat about research, or are a woman/minority student considering graduate school in Psychology or Computer Science, please feel free to reach out to me at soniamurthy [at] g [dot] harvard [dot] edu and I will do my best to respond!