User:Deisenroth
From WWWwikiEn<span style="display: none;">[[Image:PHDWS09_Deisenroth.pdf]][[Image:ICML09_DeisenrothHuber.pdf]][[Image:Neurocomputing09_DeisenrothRasmussenPeters_preprint.pdf]][[Image:ACC08_DeisenrothPeters-GPDPoriginal.pdf]][[Image:ESANN08_DeisenrothRasmussen-GPDPsysid.pdf]][[Image:EWRL08_RasmussenDeisenroth.pdf]][[Image:ECC07_DeisenrothWeissel.pdf]][[Image:MFI06_DeisenrothWeissel.pdf]]</span> <span style="display: none;">[[Image:PHDWS09_Deisenroth.pdf]][[Image:ICML09_DeisenrothHuber.pdf]][[Image:Neurocomputing09_DeisenrothRasmussenPeters_preprint.pdf]][[Image:ACC08_DeisenrothPeters-GPDPoriginal.pdf]][[Image:ESANN08_DeisenrothRasmussen-GPDPsysid.pdf]][[Image:EWRL08_RasmussenDeisenroth.pdf]][[Image:ECC07_DeisenrothWeissel.pdf]][[Image:MFI06_DeisenrothWeissel.pdf]]</span>
Marc Deisenroth
![]() | Dipl.-Inform. | |
I'm almost all the time in Cambridge where I work on Gaussian process models for reinforcement learning together with Carl Rasmussen (University of Cambridge) and Jan Peters (Max Planck Institute for Biological Cybernetics).
Please check my homepage in Cambridge for all research-related stuff (including papers, demos, code, ...).
Links
- my current homepage in Cambridge, UK
- Computational Biological and Learning Lab, University of Cambridge, UK
- Max Planck Institute for Biological Cybernetics Tübingen, Germany
- The Gaussian Process Web Site (introductions, code, books, ...)
Publications
Ryan Turner, Marc Peter Deisenroth, Carl Edward Rasmussen,
System Identification in Gaussian Process Dynamical Systems,
- Nonparametric Bayes Workshop at NIPS 2009, Whistler, Canada, December, 2009.
Author : Ryan Turner, Marc Peter Deisenroth, Carl Edward Rasmussen
Title : System Identification in Gaussian Process Dynamical Systems
In : Nonparametric Bayes Workshop at NIPS 2009
Date : December 2009
Efficient Reinforcement Learning for Motor Control,
- Proceedings of the 10th International PhD Workshop on Systems and Control, Hluboka nad Vltavou, Czech Republic, September, 2009.
Author : Marc P. Deisenroth, Carl E. Rasmussen
Title : Efficient Reinforcement Learning for Motor Control
In : Proceedings of the 10th International PhD Workshop on Systems and Control
Date : September 2009
Bayesian Inference for Efficient Learning in Control,
- Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, Canada, June, 2009.
Author : Marc Peter Deisenroth, Carl Edward Rasmussen
Title : Bayesian Inference for Efficient Learning in Control
In : Multidisciplinary Symposium on Reinforcement Learning (MSRL)
Date : June 2009
Analytic Moment-based Gaussian Process Filtering,
- 26th International Conference on Machine Learning (ICML 2009) in Montreal, Canada, June, 2009.
Author : Marc P. Deisenroth, Marco F. Huber, Uwe D. HanebeckAbstract
Title : Analytic Moment-based Gaussian Process Filtering
In : 26th International Conference on Machine Learning (ICML 2009) in Montreal, Canada
Date : June 2009We propose an analytic moment-based filter for nonlinear stochastic
dynamic systems modeled by Gaussian processes. Exact expressions for the
expected value and the covariance matrix are provided for both the
prediction step and the filter step, where an additional Gaussian
assumption is exploited in the latter case. Our filter does not require
further approximations. In particular, it avoids finite-sample
approximations. We compare the filter to a variety of Gaussian filters,
that is, the EKF, the UKF, and the recent GP-UKF proposed by Ko et al.
(2007).
Gaussian Process Dynamic Programming,
- Neurocomputing, 72(7-9):1508-1524, March, 2009.
URL
Author : Marc P. Deisenroth, Carl E. Rasmussen, Jan PetersAbstract
Title : Gaussian Process Dynamic Programming
In : Neurocomputing
Date : March 2009Reinforcement learning (RL) and optimal control of systems with continuous
states and actions require approximation techniques in most interesting cases.
In this article, we introduce Gaussian process dynamic programming \'(GPDP), an
approximate value-function based RL algorithm. We consider both a classic optimal
control problem, where problem-specific prior knowledge is available,
and a classic RL problem, where only very general priors can be used.
For the classic optimal control problem, GPDP models the unknown value
functions with Gaussian processes and generalizes dynamic programming to continuous-valued
states and actions. For the RL problem, GPDP starts from a given initial state
and explores the state space using Bayesian active learning. To
design a fast learner, available data has to be used efficiently.
Hence, we propose to learn probabilistic models of the a priori unknown
transition dynamics and the value functions on the fly. In both
cases, we successfully apply the resulting continuous-valued controllers
to the under-actuated pendulum swing up and analyze the performances of the
suggested algorithms. It turns out that GPDP uses data very efficiently and
can be applied to problems, where classic dynamic programming would be cumbersome.
Approximate Dynamic Programming with Gaussian Processes,
- Proceedings of the 2008 American Control Conference (ACC 2008), pp. 4480-–4485, Seattle, Washington, USA, June, 2008.
Author : Marc P. Deisenroth, Jan Peters, Carl E. RasmussenAbstract
Title : Approximate Dynamic Programming with Gaussian Processes
In : Proceedings of the 2008 American Control Conference (ACC 2008)
Date : June 2008In general, it is difficult to determine an optimal closed-loop policy
in nonlinear control problems with continuous-valued state and control
domains. Hence, approximations are often inevitable. The standard
method of discretizing states and controls suffers from the curse
of dimensionality and strongly depends on the chosen temporal sampling
rate. In this paper, we introduce Gaussian process dynamic programming
(GPDP) and determine an approximate globally optimal closed-loop
policy. In GPDP, value functions in the Bellman recursion of the
dynamic programming algorithm are modeled using Gaussian processes.
GPDP returns an optimal state-feedback for a finite set of states.
Based on these outcomes, we learn a possibly discontinuous closed-loop
policy on the entire state space by switching between two independently
trained Gaussian processes. A binary classifier selects one Gaussian
process to predict the optimal control signal. We show that GPDP
is able to yield an almost optimal solution to an LQ problem using
few sample points. Moreover, we successfully apply GPDP to the underpowered
pendulum swing up, a complex nonlinear control problem.
Model-Based Reinforcement Learning with Continuous States and Actions,
- Proceedings of the European Symposium on Artificial Neural Networks
(ESANN 2008), pp. 19-24, Bruges, Belgium, April, 2008.
Author : Marc P. Deisenroth, Carl E. Rasmussen, Jan PetersAbstract
Title : Model-Based Reinforcement Learning with Continuous States and Actions
In : Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)
Date : April 2008Finding an optimal policy in a reinforcement learning (RL) framework
with continuous state and action spaces is challenging. Approximate
solutions are often inevitable. GPDP is an approximate dynamic programming
algorithm based on Gaussian process (GP) models for the value functions.
In this paper, we extend GPDP to the case of unknown transition dynamics.
After building a GP model for the transition dynamics, we apply GPDP
to this model and determine a continuous-valued policy in the entire
state space. We apply the resulting controller to the underpowered
pendulum swing up. Moreover, we compare our results on this RL task
to a nearly optimal discrete DP solution in a fully known environment.
Probabilistic Inference for Fast Learning in Control,
- Recent Advances in Reinforcement Learning. Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL 2008), 5323:229-242, Springer-Verlag, November, 2008.
Author : Carl E. Rasmussen, Marc P. DeisenrothAbstract
Title : Probabilistic Inference for Fast Learning in Control
In : Recent Advances in Reinforcement Learning. Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL 2008)
Date : November 2008We provide a novel framework for very fast model-based reinforcement
learning in continuous state and action spaces. The framework requires
probabilistic models that explicitly characterize their levels of confidence.
Within this framework, we use flexible, non-parametric models to describe the
world based on previously collected experience. We demonstrate learning on the
cart-pole problem in a setting where we provide very limited prior knowledge
about the task. Learning progresses rapidly, and a good policy is found after
only a hand-full of iterations.
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces,
- Proceedings of the 2007 European Control Conference (ECC 2007), Kos, Greece, July, 2007.
Author : Marc P. Deisenroth, Florian Weissel, Toshiyuki Ohtsuka, Uwe D. HanebeckAbstract
Title : Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces
In : Proceedings of the 2007 European Control Conference (ECC 2007)
Date : July 2007A novel online-computation approach to optimal control of nonlinear,
noise-affected systems with continuous state and control spaces is
presented. In the proposed algorithm, system noise is explicitly
incorporated into the control decision. This leads to superior results
compared to state-of-the-art nonlinear controllers that neglect this
influence. The solution of an optimal nonlinear controller for a
corresponding deterministic system is employed to find a meaningful
state space restriction. This restriction is obtained by means of
approximate state prediction using the noisy system equation. Within
this constrained state space, an optimal closed-loop solution for
a finite decisionmaking horizon (prediction horizon) is determined
within an adaptively restricted optimization space. Interleaving
stochastic dynamic programming and value function approximation yields
a solution to the considered optimal control problem. The enhanced
performance of the proposed discrete-time controller is illustrated
by means of a scalar example system. Nonlinear model predictive control
is applied to address approximate treatment of infinite-horizon problems
by the finite-horizon controller.
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle,
- Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2006), pp. 371-376, Heidelberg, Germany, September, 2006.
Author : Marc P. Deisenroth, Toshiyuki Ohtsuka, Florian Weissel, Dietrich Brunn, Uwe D. HanebeckAbstract
Title : Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle
In : Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2006)
Date : September 2006In this paper, an approach to the finite-horizon
optimal state-feedback control problem of nonlinear, stochastic,
discrete-time systems is presented. Starting from the dynamic
equation, the value function will be approximated
by means of Taylor series expansion up to second-order
derivatives. Moreover, the problem will be reformulated, such
that a minimum principle can be applied to the stochastic
problem. Employing this minimum principle, the optimal control
problem can be rewritten as a two-point boundary-value
problem to be solved at each time step of a shrinking horizon.
To avoid numerical problems, the two-point boundary-value
problem will be solved by means of a continuation method.
Thus, the curse of dimensionality of dynamic programming
is avoided, and good candidates for the optimal state-feedback
controls are obtained. The proposed approach will be evaluated
by means of a scalar example system.
