Rank TD: End-to-End Robotic Reinforcement Learning Without Reward Engineering and Demonstrations


The development of reinforcement learning and deep neural networks allow us to train a decision-making system for robots by the end-to-end method, which directly leverages raw sensory inputs, and outputs an action. Designing a reward function that not only reflects the goal of the task but also facilitates the agent’s exploration, however, is tedious and challenging. This paper introduces a technique that allows agents to explore following the expert-designed state trajectory and take a balance between the creativity of agents and the rigid rules of the game shaped by prior knowledge. We investigate and evaluate our approach on a simple case and a complex robotic arm grasping-task. The results show that our method has a good application prospect in the sim2real field.

日本機械学会ロボティクス・メカトロニクス講演会2020 (ROBOMECH2020)