Research

My general interests are very broad — I’m interested in AI, machine learning, programming languages, complexity theory, algorithms, security, quantum computing, you name it. However, I think that it is likely that we will build human-level AI systems within this century, and that we should invest in ensuring that this is beneficial for humanity.

As a result, I focus on high-level questions about the future of AI: What techniques will we use to build human-level AI systems? How can we make these techniques safer? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.

My research at DeepMind is focused on finding general algorithms that can be used to build AI systems that pursue tasks that do not have a formal definition.

My PhD dissertation focused on the idea that since we have optimized our environment to suit our preferences, an AI system should be able to infer aspects of our preferences just by observing the current state of the world. See also my dissertation talk and the paper and blog post that introduced the idea.

In the past, I worked with Ras Bodik in the PLSE lab at the University of Washington. I applied program synthesis techniques to automatically generate incremental update rules that accelerated approximate sampling algorithms used in probabilistic programming. I have also applied partial evaluation and memoization to compile sampling algorithms.

Publications

An Empirical Investigation of Representation Learning for Imitation. Xin Chen*, Sam Toyer*, Cody Wild*, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. In NeurIPS 2021 Datasets and Benchmarks Track.

The MineRL BASALT Competition on Learning from Human Feedback. Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan. In NeurIPS 2021 Competition Track.
Supplementary materials: BAIR blog post, Website, Competition page, Twitter thread

Optimal Policies Tend to Seek Power. Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli. In Neural Information Processing Systems (NeurIPS 2021).

Evaluating the Robustness of Collaborative Agents. Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah. Extended Abstract in 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021).

Learning What To Do by Simulating the Past. David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan. In 9th International Conference on Learning Representations (ICLR 2021).
Supplementary materials: BAIR blog post, Talk, Website, Code, Twitter thread

Benefits of Assistance over Reward Learning. Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell. In Workshop on Cooperative AI (Cooperative AI @ NeurIPS 2020). Best Paper Award.

The MAGICAL Benchmark for Robust Imitation. Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. In Neural Information Processing Systems (NeurIPS 2020).
Supplementary materials: Github, PyPI

On the Utility of Learning about Humans for Human-AI Coordination. Micah Carroll, Rohin Shah, Mark Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan. In Neural Information Processing Systems (NeurIPS 2019).
Supplementary materials: BAIR blog post, poster, Alignment Forum post, Twitter thread.

Preferences Implicit in the State of the World. Rohin Shah*, Dmitrii Krasheninnikov*, Jordan Alexander, Pieter Abbeel, and Anca Dragan. In 7th International Conference on Learning Representations (ICLR 2019).
Supplementary materials: BAIR blog post, poster, Alignment Forum post, Twitter thread.

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. Rohin Shah, Noah Gundotra, Pieter Abbeel, and Anca Dragan. In International Conference on Machine Learning (ICML 2019).
Supplementary materials: Alignment Forum post, poster.

Choice Set Misspecification in Reward Inference. Rachel Freedman, Rohin Shah, Anca Dragan. In Workshop on Artificial Intelligence Safety (AISafety @ IJCAI-PRICAI 2020). Best Paper Award.

Combining reward information from multiple sources. Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. In Workshop on Learning with Rich Experience (LIRE @ NeurIPS 2019).

Active Inverse Reward Design. Sören Mindermann*, Rohin Shah*, Adam Gleave and Dylan Hadfield-Menell. In Workshop on Goal Specifications for Reinforcement Learning (GoalsRL 2018).

Scalable Synthesis with Symbolic Syntax Graphs. Rohin Shah, Sumith Kulal, and Rastislav Bodik. In Seventh Workshop on Synthesis (SYNT 2018).

Automated Incrementalization through Synthesis. Rohin Shah and Rastislav Bodik. In First Workshop on Incremental Computing (IC 2017).

SIMPL: A DSL for Automatic Specialization of Inference Algorithms. Rohin Shah, Emina Torlak and Rastislav Bodik. arXiv:1604.04729.

Chlorophyll: synthesis-aided compiler for low-power spatial architectures. Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Rastislav Bodik. In 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014).

Research-related blog posts

AI Alignment 2018-19 Review. Alignment Forum, 2020. A literature review of the conceptual work in AI alignment over 2019, with some inclusions from 2018.

Collaborating with Humans Requires Understanding Them. BAIR blog, 2019. Explains the high-level ideas in our paper “On the Utility of Learning about Humans for Human-AI Coordination“.

Human-AI Collaboration. Alignment Forum, 2019. Speculates on how the paper “On the Utility of Learning about Humans for Human-AI Coordination” fits into the broader landscape of AI alignment research.

A review of Stuart Russell’s book, Human Compatible. Special edition of the Alignment Newsletter, 2019. A summary and review of Human Compatible and several research papers that inform it.

Clarifying some key hypotheses in AI alignment. Alignment Forum, 2019. The title says it all. This post was primarily written by Ben Cottier, and I played an advisory role.

Learning biases and rewards simultaneously. Alignment Forum, 2019. Speculates on how the paper “On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference” fits into the broader landscape of AI alignment research.

Learning Preferences by Looking at the World. BAIR blog, 2019. Explains the high-level ideas in our paper “Preferences Implicit in the State of the World“.

Learning preferences by looking at the world. Alignment Forum, 2019. Speculates on how the paper “Preferences Implicit in the State of the World” fits into the broader landscape of AI alignment research.

Value Learning sequence. Alignment Forum, 2019. A sequence of blog posts investigating the feasibility of value learning as an approach to AI alignment.

Selected Presentations

Spoke on the AI alignment podcast thrice:

  1. An Overview of Technical AI Alignment in 2018 and 2019 (with Buck Shlegeris)
  2. An Overview of Technical AI Alignment: Part 1 and Part 2.
  3. Inverse Reinforcement Learning and the State of AI Alignment

Spoke at EA Global 2020 and the Foresight AGI Strategy Meetup 2020 about the landscape of AI alignment work.

The Importance of Threat Models for AI Alignment (slides). Talk at the SlateStarCodex meetup about why we should build threat models.

Researchers at AI Impacts interviewed me about my reasons for believing that AI safety will probably be solved without additional intervention from people focused on improving the far future. See also my summary of that and three other related conversations for the Alignment Newsletter.

Interviewed on the Machine Ethics podcast about alignment problems in AI, constraining AI behavior, current AI vs future AI, recommendation algorithms and extremism, appropriate uses of AI, and the fuzziness of fairness.

AGI Safety Research Agendas (slides). Invited talk at Beneficial AGI 2019. Describes and contrasts the various research agendas that people are pursuing for AGI safety.

Decomposing Beneficial AI (slides). Lightning talk at Beneficial AGI 2019. Argues that we should think of AI alignment in terms of its motivation and competence, rather than a definition of what it should do and its ability to optimize that definition. See also this comment.

Service

Editor and primary content producer of the Alignment Newsletter.

PC Member (conferences): NeurIPS (2021, 2020, 2019), ICLR (2021, 2020),  ICML (2019), NeurIPS Datasets and Benchmarks Track (2021). Outstanding Reviewer Award for NeurIPS 2021.

PC Member (workshops): DeepRL (NeurIPS 2021, 2020, 2019, 2018), Cooperative AI (NeurIPS 2021), RL4RL (ICML 2021), ERS (IROS 2021), HCML (NeurIPS 2019), I3 (ICML 2019), SafeML (ICLR 2019), IJCAI-ECAI-18 Survey Track and Incremental Computing (PLDI 2017).