Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter

Published in NSF, 2020

Recommended citation: Wu, Bohan, Akinola, Iretiayo, Gupta, Abhi, Xu, Feng, Varley, Jacob, Watkins-Valls, David, and Allen, Peter K. Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter. Retrieved from https://par.nsf.gov/biblio/10164432. Autonomous Robots . Web. doi:10.1007/s10514-020-09907-y. https://par.nsf.gov/biblio/10164432

Abstract

Generative Attention Learning (GenerAL) is a framework for high-DOF multi-fingered grasping that is not only robust to dense clutter and novel objects but also effective with a variety of different parallel-jaw and multi-fingered robot hands. This framework introduces a novel attention mechanism that substantially improves the grasp success rate in clutter. Its generative nature allows the learning of full-DOF grasps with flexible end-effector positions and orientations, as well as all finger joint angles of the hand. Trained purely in simulation, this framework skillfully closes the sim-to-real gap. To close the visual sim-to-real gap, this framework uses a single depth image as input. To close the dynamics sim-to-real gap, this framework circumvents continuous motor control with a direct mapping from pixel to Cartesian space inferred from the same depth image. Finally, this framework demonstrates inter-robot generality by achieving over 92% real-world grasp success rates in cluttered scenes with novel objects using two multi-fingered robotic hand-arm systems with different degrees of freedom.