Rohin Shah

Hi, I’m Rohin! I lead the AGI Safety & Alignment team at Google DeepMind, where we prepare for the development of powerful AI systems, through both research and policy implementation.

I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don’t initially know what the user wants. I used to write up paper summaries in the Alignment Newsletter, though the newsletter is unfortunately on indefinite hiatus now.

In my free time, I enjoy puzzles, board games, and karaoke. You can email me at rohinmshah@gmail.com, though if you want to ask me about careers in AI alignment, you should read my FAQ first.

Research →

Alignment Newsletter →

Research

My research focuses on AI safety: techniques that ensure that AI systems do what their developers intend.

Amplified oversight leverages AI capabilities to evaluate AI outputs. I’m particularly excited about empirical work on debate.

Since we build AI systems through machine learning, we don’t understand how they work internally. Interpretability research such as sparse autoencoders aims to bridge this gap.

Monitoring AI systems after they are deployed broadly can defend against cases where AI systems appear safe during testing but cause problems “in the wild”.

Dangerous capability evaluations like these can provide an early warning for risks, allowing us to put appropriate mitigations in place.

Papers →

Latest Articles

January 3, 2021

FAQ: Advice for AI alignment researchers
January 8, 2017

Teaching from Simple Abstractions
December 20, 2016

Thoughts on the “Meta Trap”

Research

Latest Articles

FAQ: Advice for AI alignment researchers

Teaching from Simple Abstractions

Thoughts on the “Meta Trap”