Honours Project: 2024 Winter April 23, 2024 - 11:22am

Carleton University - School of Computer Science Honours Project

Winter 2024

Integrating Diffusion Models with Discrete Action Spaces in Offline Reinforcement Learning

ABSTRACT

Reinforcement Learning (RL) in offline settings encounters challenges such as managing out-of-distribution actions and multimodal action distributions. This thesis presents Discrete Diffusion Q-Learning (D2QL), a framework that extends Diffusion Q-Learning to discrete-action environments. D2QL integrates a behavior cloning stochastic differential equation-based Dirichlet Diffusion Score Model (DDSM) with Q-Learning for use in discrete offline RL. D2QL aims to unlock the expressive potential of diffusion models for a wider range of RL applications. Although empirical testing remains pending, the D2QL framework establishes a theoretical foundation and outlines potential research avenues for enhancing offline RL in discrete settings.