I am currently unemployed. I used to be an AI researcher in deep reinforcement learning. I wrote two works improving the optimization stability of off-policy gradient-based Q-learning algorithms.
-
Stabilizing Q-Learning for Continuous Control
David Yu-Tung Hui
MSc Thesis, University of Montreal, 2022
I described two principles for stabilizing deep learning algorithms and applied the principles to deep reinforcement learning. The principles were 1) maximum entropy for deriving loss functions and 2) the neural tangent kernel for deriving regularizers. In RL, maximum entropy justified the SACLite Q-learning algorithm and the LayerNorm regularizer reduced its divergence, especially in high-dimensional continuous control.
[.pdf] [Errata] -
Double Gumbel Q-Learning
David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon
Spotlight at NeurIPS 2023
We showed that deep Q-learning has two heteroscedastic Gumbel noise sources arising from the parameter inaccuracy of deep neural networks. An algorithm accounting for these noise sources attained just under 2 times the aggregate asymptotic performance of the popular SAC baseline.
[.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Code (GitHub)] [Errata]
The best way to contact me is email. My email address is listed in one of my written works.