Carleton University - School of Computer Science Honours Project
Winter 2024
Integrating Diffusion Models with Discrete Action Spaces in Offline Reinforcement Learning
SCS Honours Project Image
ABSTRACT
Reinforcement Learning (RL) in offline settings encounters challenges such as managing out-of-distribution actions and multimodal action distributions. This thesis presents Discrete Diffusion Q-Learning (D2QL), a framework that extends Diffusion Q-Learning to discrete-action environments. D2QL integrates a behavior cloning stochastic differential equation-based Dirichlet Diffusion Score Model (DDSM) with Q-Learning for use in discrete offline RL. D2QL aims to unlock the expressive potential of diffusion models for a wider range of RL applications. Although empirical testing remains pending, the D2QL framework establishes a theoretical foundation and outlines potential research avenues for enhancing offline RL in discrete settings.