DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

Published in ICML 2023 Workshop on The Many Facets of Preference-Based Learning, 2023

Recommended citation: Novoseller, E., Goecks, V. G., Watkins, D., Miller, J., & Waytowich, N. R. (2023). DIP-RL: Demonstration-Inferred Preference Learning in Minecraft. In ICML 2023 Workshop on The Many Facets of Preference-Based Learning. https://arxiv.org/abs/2307.12158

Abstract

In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways:

  1. Training an autoencoder
  2. Seeding reinforcement learning (RL) training batches with demonstration data
  3. Inferring preferences over behaviors to learn a reward function to guide RL.

We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that reflects human preferences and that DIP-RL performs competitively relative to baselines. DIP-RL is inspired by our previous work on combining demonstrations and pairwise preferences in Minecraft, which was awarded a research prize at the 2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in Minecraft. Example trajectory rollouts of DIP-RL and baselines are available at this site.