Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

We consider the challenging scenario of contextual bandits with continuous actions and large context spaces. This is an increasingly important application area in personalised healthcare where an agent is requested to make dosing decisions based on a patient's single image scan. In this paper, we first adapt a reinforcement learning (RL) algorithm for continuous control to outperform contextual bandit algorithms specifically hand-crafted for continuous action spaces. We empirically demonstrate this on a suite of standard benchmark datasets for vector contexts. Secondly, we demonstrate that our RL agent can generalise problems with continuous actions to large context spaces, providing results that outperform previous methods on image contexts. Thirdly, we introduce a new contextual bandits test domain with multi-dimensional continuous action space and image contexts which existing tree-based methods cannot handle. We provide initial results with our RL agent.

Original publication




Conference paper

Publication Date





590 - 597