Overview

This website contains supplementary material to our work "Learning Manipulation States and Actions for Efficient Non-prehensile Rearrangement Planning". In this work, we addresse non-prehensile rearrangement planning problems where a robot is tasked to rearrange objects among obstacles on a planar surface. We present an efficient planning algorithm that is designed to impose few assumptions on the robot's non-prehensile manipulation abilities and is simple to adapt to different robot embodiments. The manuscript is available here.

In our approach we combine sampling-based motion planning with reinforcement learning and generative modeling. Our algorithm explores the composite configuration space of objects and robot as a search over robot actions, forward simulated in a dynamic physics model. This search is guided by a generative model that provides samples of robot states from which an object can be transported towards a desired state, and a learned policy that provides corresponding robot actions. For the generative model, we apply Generative Adversarial Networks (GANs) to efficiently produce robot state samples within the planner.

We implemented and evaluated our approach for robots endowed with configuration spaces in SE(2). Using this implementation, we collected videos of example solutions produced by our planner, as well as visualizations of the learned models. In addition to these, this page provides an illustrative video explaining the search tree extension procedure of our algorithm. For more details, we refer to our manuscript.

Example Solutions

Slalom - Navigating an object around static obstacles
Long Slalom - Navigating an object around static obstacles for a longer distance
Slalom Slippery - Navigating a sliding object around static obstacles
Dual Slalom - Navigating two objects with different physical properties around static obstacles
ABC - Rearringing multiple objects with different physical properties
Movable Wall - Rearranging two objects among movable obstacles

Extension Strategy

The following video illustrates how our algorithm performs its search tree extension, when it is using the generative model and the policy. Note that the algorithm as described in our manuscript also performs random extensions step as well as extensions steps targeted at moving only the robot. For brevity, these cases are not illustrated in the following video.

Policy

The policy is learned offline from random robot-object interactions. In order for the policy to be applicable to various different objects, we parameterize it by an object's physical properties \(\nu_i\). In the following, we illustrate how changing parts of \(\nu_i\) affects the choice of action by the learned policy.

As shown in our manuscript, the expected distance between a desired and the actual resulting state of an action can be computed from learned forward models \(f_\mu\) and \(f_{\sigma^2}\). In general, this distance can be expressed as

The following video illustrates how this function changes, as we modify only the friction parameter between the pushed object and the support surface (which is part of \(\nu_i\)) and keep all other inputs fixed. The object is located between the fingers of the robot and the desired object state is a few \(cm\) above the initial state. The white dot shows the \(x\) and \(y\) velocities that the learned policy selects in the given situation. The colored areas show the value of \(L\) for different choices of \(u\).

As the friction increases, more energy is needed to transport the object towards the desired state. The action space we chose in our implementation is the parameter space of ramp shaped velocity profiles, which guarantees us that the robot is at rest after each action. Given this parameterization, the policy can influence the kinetic energy transferred to the pushed object by choosing different robot velocities. Thus, as we increase the friction parameter the velocity of the optimal action increases.

The next video illustrates $L$ as we change the desired target state. We can observe that the trained policy correctly learned in which direction it should move the robot in order to transport the object towards the desired state.

Generative Model

The following video shows samples of manipulation states for a round object generated by the learned generative model. The samples are generated in realtime (256 per frame). The object is located at the center of the plot and the desired target state is indicated by the dotted circle. As we change the radius of the object the samples move slightly away from the object center. When we change the distance to the target state, the samples are more, or less, concentrated on the opposite side of the object. As we change the direction (angle) in which to push the object, the samples rotate around the object accordingly.

Contact

For technical questions please contact either Joshua A. Haustein or Isac Arnekvist.
Name Email
Joshua A. Haustein haustein at kth dot se
Isac Arnekvist isacar at kth dot se
Johannes Stork jastork at kth dot se
Kaiyu Hang kaiyu dot hang at yale dot edu
Danica Kragic dani at kth dot se
Joshua A. Haustein, Isac Arnekvist, Johannes Stork and Danica Kragic are with the
Robotics, Perception and Learning Lab (RPL)
CAS, EECS
KTH Royal Institute of Technology
Stockholm, Sweden
Kaiyu Hang is with the
GRAB Lab
Yale University
New Haven, USA

Source Code

The source code of our implementation is available on Github.