Braham Snyder
(rhymes with "Graham" and "Sam")
I create more efficient machine learning algorithms for sequential
decision-making. I'm focusing on reinforcement learning (RL) because
it's likely important for outperforming the best decisions in prior
data.
One of my goals is to fix the instability of combining three
standard principles for efficiency in RL. That is, to fix the deadly
triad. I think moving closer to Bellman residual minimization
(AKA Bellman error minimization) might be part of the
simplest solution. Minimizing those residuals with gradient
descent is often called the residual gradient
algorithm.
I'm a PhD student, fortunate to be advised by
Chen-Yu Wei
at UVA. I'm also fortunate to have worked with similarly great
people beforehand — most recently, I was advised by
Yuke Zhu and
collaborated with Amy
Zhang for my MS at UT Austin.
email
google scholar
twitter
bluesky
|
HANQ: Hypergradients, Asymmetry, and Normalization for
Fast and Stable Deep Q-Learning
Braham Snyder,
Chen-Yu Wei
Reinforcement Learning Journal (RLJ) / Reinforcement Learning Conference (RLC), 2025
paper
code
forthcoming
Inspired by self-supervised learning (SSL), we empirically
arrive at three modifications to the standard deep
Q-network (DQN) — no two of
which work well alone in our ablations. Aligning with prior
work in SSL, HANQ (pronounced "hank") avoids DQN's
target network, uses the same number of hyperparameters as
DQN, and yet matches or exceeds DQN's performance in our
offline RL experiments on three out of four environments.
|
Target Rate Optimization: Avoiding Iterative Error Exploitation
Braham Snyder,
Amy Zhang,
Yuke Zhu
NeurIPS Foundation Models for Decision Making Workshop, 2023
preprint, 2024
paper
To lessen the instability of conventional deadly triad algorithms,
we optimize the rate at which their bootstrapped targets are
updated. Our main approach to this Target
Rate
Optimization (TRO)
uses a residual gradient. Changing nothing else, TRO
increases (final) return on almost half of the domains we
test, by up to ~.
|
Towards Convergent Offline Reinforcement Learning
Braham Snyder
MS thesis, UT Austin, 2023
paper
Raisin with a higher-level abstract and introduction, and an updated
conclusion. Includes more of my ideas for fixing the residual
gradient, and discusses preliminary experiments in those
directions.
|
Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning
Braham Snyder,
Yuke Zhu
NeurIPS Offline Reinforcement Learning Workshop, 2022
preprint, 2023
paper
ICLR
reviews (rejected, top ~30%)
We revisit residual algorithms, averages of the semi-gradient (the
conventional approach) and the residual gradient. We add residual
algorithms to a simple and high-scoring but inefficient offline
algorithm. Changing nothing else, the residual weight
hyperparameter reduces the number of neural networks required
by on a standard benchmark domain.
|
|