Speeding Up Deep Reinforcement Learning Training in Autonomous Driving with Latent State Representation

Emmanuel Ifeanyi Iroegbu, Devaraj Madhavi


Deep reinforcement learning has been successful in solving common autonomous driving task such as lane keeping by simply using pixel data from front view camera as input. However, raw pixel data contains very high-dimensional observation that affects the learning quality of the agent due to the complexity imposed by a ‘realistic’ urban environment. As a result, we investigate how compressing the raw pixel data from high-dimensional state to low-dimensional latent space using a variational autoencoder, can significantly improve the training of the agent. We evaluate our method on a simulated autonomous vehicle in CARAL and compared our results with many baselines including DDPG, PPO and SAC. The result shows that our method greatly speeds up the training time and significantly improves the quality of the deep reinforcement learning.


Deep Reinforcement Learning Autonomous driving, Latent State Representation, Variational autoencoder


Milakis, Dimitris, Bart Van Arem, and Bert Van Wee. "Policy and society related implications of automated driving: A review of literature and directions for future research." Journal of Intelligent Transportation Systems 21, no. 4 (2017): 324-348.

Kendall, A., Hawke, J., Janz, D., Mazur, P., Reda, D., Allen, J.-M., Lam, V.-D., Bewley, A., and Shah, A. Learning to Drive in a Day. CoRR, abs/1807.00412, 2018.

Vitelli, Matt, and Aran Nayebi. CARMA: A Deep Reinforcement Learning Approach to Autonomous Driving. Tech. rep. Stanford University, 2016.

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, Feb 2015.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” CoRR, vol. abs/1509.02971, 2015.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.

Kingma, D., and J. Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations, 2015.

D. Gonz´alez, J. P´erez, V. Milan´es, and F. Nashashibi. A review of motion planning techniques for automated vehicles. IEEE Transactions on Intelligent Transportation Systems, 17(4):1135–1145, 2016.

B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on intelligent vehicles, 1(1):33–55, 2016.

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.

F. Codevilla, M. Miiller, A. L´opez, V. Koltun, and A. Dosovitskiy.

End-to-end driving via conditional imitation learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–9. IEEE, 2018.

Pan, Xinlei, Yurong You, Ziyan Wang, and Cewu Lu. "Virtual to real reinforcement learning for autonomous driving." arXiv preprint arXiv:1704.03952 (2017).

Sallab, A., Abdou, M., Perot, E., and Yogamani, S. Deep Reinforcement Learning framework for Autonomous Driving. Electronic Imaging, 2017:70–76, jan 2017.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” CoRR, vol. abs/1312.5602, 2013.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.

D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

Raffin, Antonin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, and David Filliat. "Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics." arXiv preprint arXiv:1901.08651 (2019).

Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, and Yuhuai Wu. Stable Baselines. https://github.com/hill-a/stable-baselines, 2019


D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus. Improving sample efficiency in model-free reinforcement learning from images. arXiv preprint arXiv:1910.01741, 2019.

Prakash, Bharat, Mark Horton, Nicholas R. Waytowich, William David Hairston, Tim Oates, and Tinoosh Mohsenin. "On the use of deep autoencoders for efficient embedded reinforcement learning." In Proceedings of the 2019 on Great Lakes Symposium on VLSI, pp. 507-512. 2019.

Bibbona, Enrico, Gianna Panfilo, and Patrizia Tavella. "The Ornstein–Uhlenbeck process as a model of a low pass filtered white noise." Metrologia 45, no. 6 (2008): S117.

DOI: http://doi.org/10.11591/ijai.v10.i3.pp%25p


  • There are currently no refbacks.

View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.