Skip to content
View dyth's full-sized avatar

Block or report dyth

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dyth/README.md

David Yu-Tung Hui 許宇同

I am currently unemployed. I used to be an AI researcher in deep reinforcement learning. I wrote two works improving the optimization stability of off-policy gradient-based Q-learning algorithms.

  1. Stabilizing Q-Learning for Continuous Control
    David Yu-Tung Hui
    MSc Thesis, University of Montreal, 2022
    I described two principles for stabilizing deep learning algorithms and applied the principles to deep reinforcement learning. The principles were 1) maximum entropy for deriving loss functions and 2) the neural tangent kernel for deriving regularizers. In RL, maximum entropy justified the SACLite Q-learning algorithm and the LayerNorm regularizer reduced its divergence, especially in high-dimensional continuous control.
    [.pdf] [Errata]

  2. Double Gumbel Q-Learning
    David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon
    Spotlight at NeurIPS 2023
    We showed that deep Q-learning has two heteroscedastic Gumbel noise sources arising from the parameter inaccuracy of deep neural networks. An algorithm accounting for these noise sources attained just under 2 times the aggregate asymptotic performance of the popular SAC baseline.
    [.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Code (GitHub)] [Errata]

The best way to contact me is email. My email address is listed in one of my written works.

Pinned Loading

  1. doublegum doublegum Public

    NeurIPS 2023 Spotlight

    Python 10 5

  2. causal-entropic-forces causal-entropic-forces Public

    Python reimplementation of Wissner-Gross & Freer, 2013

    Jupyter Notebook 13 5