User:Deisenroth

From WWWwikiEn<span style="display: none;">[[Image:PHDWS09_Deisenroth.pdf]][[Image:ICML09_DeisenrothHuber.pdf]][[Image:Neurocomputing09_DeisenrothRasmussenPeters_preprint.pdf]][[Image:ACC08_DeisenrothPeters-GPDPoriginal.pdf]][[Image:ESANN08_DeisenrothRasmussen-GPDPsysid.pdf]][[Image:EWRL08_RasmussenDeisenroth.pdf]][[Image:ECC07_DeisenrothWeissel.pdf]][[Image:MFI06_DeisenrothWeissel.pdf]]</span> <span style="display: none;">[[Image:PHDWS09_Deisenroth.pdf]][[Image:ICML09_DeisenrothHuber.pdf]][[Image:Neurocomputing09_DeisenrothRasmussenPeters_preprint.pdf]][[Image:ACC08_DeisenrothPeters-GPDPoriginal.pdf]][[Image:ESANN08_DeisenrothRasmussen-GPDPsysid.pdf]][[Image:EWRL08_RasmussenDeisenroth.pdf]][[Image:ECC07_DeisenrothWeissel.pdf]][[Image:MFI06_DeisenrothWeissel.pdf]]</span>

Jump to: navigation, search

I'm almost all the time in Cambridge where I work on Gaussian process models for reinforcement learning together with Carl Rasmussen (University of Cambridge) and Jan Peters (Max Planck Institute for Biological Cybernetics).

Please check my homepage in Cambridge for all research-related stuff (including papers, demos, code, ...).

Links


Publications

Ryan Turner, Marc Peter Deisenroth, Carl Edward Rasmussen,
System Identification in Gaussian Process Dynamical Systems,
Nonparametric Bayes Workshop at NIPS 2009, Whistler, Canada, December, 2009.
BibTeX
Author : Ryan Turner, Marc Peter Deisenroth, Carl Edward Rasmussen
Title : System Identification in Gaussian Process Dynamical Systems
In : Nonparametric Bayes Workshop at NIPS 2009
Date : December 2009
Marc P. Deisenroth, Carl E. Rasmussen,
Efficient Reinforcement Learning for Motor Control,
Proceedings of the 10th International PhD Workshop on Systems and Control, Hluboka nad Vltavou, Czech Republic, September, 2009.
BibTeX
Author : Marc P. Deisenroth, Carl E. Rasmussen
Title : Efficient Reinforcement Learning for Motor Control
In : Proceedings of the 10th International PhD Workshop on Systems and Control
Date : September 2009
Marc Peter Deisenroth, Carl Edward Rasmussen,
Bayesian Inference for Efficient Learning in Control,
Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, Canada, June, 2009.
PDF BibTeX
Author : Marc Peter Deisenroth, Carl Edward Rasmussen
Title : Bayesian Inference for Efficient Learning in Control
In : Multidisciplinary Symposium on Reinforcement Learning (MSRL)
Date : June 2009
Marc P. Deisenroth, Marco F. Huber, Uwe D. Hanebeck,
Analytic Moment-based Gaussian Process Filtering,
26th International Conference on Machine Learning (ICML 2009) in Montreal, Canada, June, 2009.
PDF BibTeX
Author : Marc P. Deisenroth, Marco F. Huber, Uwe D. Hanebeck
Title : Analytic Moment-based Gaussian Process Filtering
In : 26th International Conference on Machine Learning (ICML 2009) in Montreal, Canada
Date : June 2009
Abstract
We propose an analytic moment-based filter for nonlinear stochastic
dynamic systems modeled by Gaussian processes. Exact expressions for the
expected value and the covariance matrix are provided for both the
prediction step and the filter step, where an additional Gaussian
assumption is exploited in the latter case. Our filter does not require
further approximations. In particular, it avoids finite-sample
approximations. We compare the filter to a variety of Gaussian filters,
that is, the EKF, the UKF, and the recent GP-UKF proposed by Ko et al.
(2007).
Marc P. Deisenroth, Carl E. Rasmussen, Jan Peters,
Gaussian Process Dynamic Programming,
Neurocomputing, 72(7-9):1508-1524, March, 2009.
PDF URL BibTeX
Author : Marc P. Deisenroth, Carl E. Rasmussen, Jan Peters
Title : Gaussian Process Dynamic Programming
In : Neurocomputing
Date : March 2009
Abstract
Reinforcement learning (RL) and optimal control of systems with continuous
states and actions require approximation techniques in most interesting cases.
In this article, we introduce Gaussian process dynamic programming \'(GPDP), an
approximate value-function based RL algorithm. We consider both a classic optimal
control problem, where problem-specific prior knowledge is available,
and a classic RL problem, where only very general priors can be used.
For the classic optimal control problem, GPDP models the unknown value
functions with Gaussian processes and generalizes dynamic programming to continuous-valued
states and actions. For the RL problem, GPDP starts from a given initial state
and explores the state space using Bayesian active learning. To
design a fast learner, available data has to be used efficiently.
Hence, we propose to learn probabilistic models of the a priori unknown
transition dynamics and the value functions on the fly. In both
cases, we successfully apply the resulting continuous-valued controllers
to the under-actuated pendulum swing up and analyze the performances of the
suggested algorithms. It turns out that GPDP uses data very efficiently and
can be applied to problems, where classic dynamic programming would be cumbersome.
Marc P. Deisenroth, Jan Peters, Carl E. Rasmussen,
Approximate Dynamic Programming with Gaussian Processes,
Proceedings of the 2008 American Control Conference (ACC 2008), pp. 4480-–4485, Seattle, Washington, USA, June, 2008.
PDF BibTeX
Author : Marc P. Deisenroth, Jan Peters, Carl E. Rasmussen
Title : Approximate Dynamic Programming with Gaussian Processes
In : Proceedings of the 2008 American Control Conference (ACC 2008)
Date : June 2008
Abstract
In general, it is difficult to determine an optimal closed-loop policy
in nonlinear control problems with continuous-valued state and control
domains. Hence, approximations are often inevitable. The standard
method of discretizing states and controls suffers from the curse
of dimensionality and strongly depends on the chosen temporal sampling
rate. In this paper, we introduce Gaussian process dynamic programming
(GPDP) and determine an approximate globally optimal closed-loop
policy. In GPDP, value functions in the Bellman recursion of the
dynamic programming algorithm are modeled using Gaussian processes.
GPDP returns an optimal state-feedback for a finite set of states.
Based on these outcomes, we learn a possibly discontinuous closed-loop
policy on the entire state space by switching between two independently
trained Gaussian processes. A binary classifier selects one Gaussian
process to predict the optimal control signal. We show that GPDP
is able to yield an almost optimal solution to an LQ problem using
few sample points. Moreover, we successfully apply GPDP to the underpowered
pendulum swing up, a complex nonlinear control problem.
Marc P. Deisenroth, Carl E. Rasmussen, Jan Peters,
Model-Based Reinforcement Learning with Continuous States and Actions,
Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008), pp. 19-24, Bruges, Belgium, April, 2008.
PDF BibTeX
Author : Marc P. Deisenroth, Carl E. Rasmussen, Jan Peters
Title : Model-Based Reinforcement Learning with Continuous States and Actions
In : Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)
Date : April 2008
Abstract
Finding an optimal policy in a reinforcement learning (RL) framework
with continuous state and action spaces is challenging. Approximate
solutions are often inevitable. GPDP is an approximate dynamic programming
algorithm based on Gaussian process (GP) models for the value functions.
In this paper, we extend GPDP to the case of unknown transition dynamics.
After building a GP model for the transition dynamics, we apply GPDP
to this model and determine a continuous-valued policy in the entire
state space. We apply the resulting controller to the underpowered
pendulum swing up. Moreover, we compare our results on this RL task
to a nearly optimal discrete DP solution in a fully known environment.
Carl E. Rasmussen, Marc P. Deisenroth,
Probabilistic Inference for Fast Learning in Control,
Recent Advances in Reinforcement Learning. Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL 2008), 5323:229-242, Springer-Verlag, November, 2008.
PDF BibTeX
Author : Carl E. Rasmussen, Marc P. Deisenroth
Title : Probabilistic Inference for Fast Learning in Control
In : Recent Advances in Reinforcement Learning. Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL 2008)
Date : November 2008
Abstract
We provide a novel framework for very fast model-based reinforcement
learning in continuous state and action spaces. The framework requires
probabilistic models that explicitly characterize their levels of confidence.
Within this framework, we use flexible, non-parametric models to describe the
world based on previously collected experience. We demonstrate learning on the
cart-pole problem in a setting where we provide very limited prior knowledge
about the task. Learning progresses rapidly, and a good policy is found after
only a hand-full of iterations.
Marc P. Deisenroth, Florian Weissel, Toshiyuki Ohtsuka, Uwe D. Hanebeck,
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces,
Proceedings of the 2007 European Control Conference (ECC 2007), Kos, Greece, July, 2007.
PDF BibTeX
Author : Marc P. Deisenroth, Florian Weissel, Toshiyuki Ohtsuka, Uwe D. Hanebeck
Title : Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces
In : Proceedings of the 2007 European Control Conference (ECC 2007)
Date : July 2007
Abstract
A novel online-computation approach to optimal control of nonlinear,
noise-affected systems with continuous state and control spaces is
presented. In the proposed algorithm, system noise is explicitly
incorporated into the control decision. This leads to superior results
compared to state-of-the-art nonlinear controllers that neglect this
influence. The solution of an optimal nonlinear controller for a
corresponding deterministic system is employed to find a meaningful
state space restriction. This restriction is obtained by means of
approximate state prediction using the noisy system equation. Within
this constrained state space, an optimal closed-loop solution for
a finite decisionmaking horizon (prediction horizon) is determined
within an adaptively restricted optimization space. Interleaving
stochastic dynamic programming and value function approximation yields
a solution to the considered optimal control problem. The enhanced
performance of the proposed discrete-time controller is illustrated
by means of a scalar example system. Nonlinear model predictive control
is applied to address approximate treatment of infinite-horizon problems
by the finite-horizon controller.
Marc P. Deisenroth, Toshiyuki Ohtsuka, Florian Weissel, Dietrich Brunn, Uwe D. Hanebeck,
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle,
Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2006), pp. 371-376, Heidelberg, Germany, September, 2006.
PDF BibTeX
Author : Marc P. Deisenroth, Toshiyuki Ohtsuka, Florian Weissel, Dietrich Brunn, Uwe D. Hanebeck
Title : Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle
In : Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2006)
Date : September 2006
Abstract
In this paper, an approach to the finite-horizon
optimal state-feedback control problem of nonlinear, stochastic,
discrete-time systems is presented. Starting from the dynamic
equation, the value function will be approximated
by means of Taylor series expansion up to second-order
derivatives. Moreover, the problem will be reformulated, such
that a minimum principle can be applied to the stochastic
problem. Employing this minimum principle, the optimal control
problem can be rewritten as a two-point boundary-value
problem to be solved at each time step of a shrinking horizon.
To avoid numerical problems, the two-point boundary-value
problem will be solved by means of a continuation method.
Thus, the curse of dimensionality of dynamic programming
is avoided, and good candidates for the optimal state-feedback
controls are obtained. The proposed approach will be evaluated
by means of a scalar example system.
Personal tools