Braham Snyder

(rhymes with "Graham" and "Sam")


I create more efficient machine learning algorithms for sequential decision-making. I'm focusing on reinforcement learning (RL) because it's likely important for outperforming the best decisions in prior data.

One of my goals is to fix the instability of combining three standard principles for efficiency in RL. That is, to fix the deadly triad. I think moving closer to Bellman residual minimization (AKA Bellman error minimization) might be part of the simplest solution. Minimizing those residuals with gradient descent is often called the residual gradient algorithm.

I'm a PhD student, fortunate to be advised by Chen-Yu Wei at UVA. I'm also fortunate to have worked with similarly great people beforehand — most recently, I was advised by Yuke Zhu and collaborated with Amy Zhang for my MS at UT Austin.


Email | Google Scholar | Twitter

profile photo
Selected Works:
Target Rate Optimization: Avoiding Iterative Error Exploitation
Braham Snyder, Amy Zhang, Yuke Zhu
NeurIPS Foundation Models for Decision Making Workshop, 2023
preprint, 2024
paper | (code forthcoming)

To lessen the instability of conventional deadly triad algorithms, we optimize the rate at which their bootstrapped targets are updated. Our main approach to this target rate optimization (TRO) uses a residual gradient. Changing nothing else, TRO increases return on almost half of the domains we test, by up to ~3x.

Towards Convergent Offline Reinforcement Learning
Braham Snyder
MS thesis, UT Austin, 2023
paper

Raisin with a higher-level abstract and introduction, and an updated conclusion. Includes more of my ideas for fixing the residual gradient, and discusses preliminary experiments in those directions.

Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning
Braham Snyder, Yuke Zhu
NeurIPS Offline Reinforcement Learning Workshop, 2022
preprint, 2023
paper | ICLR reviews (rejected, top ~30%)

We revisit residual algorithms, averages of the semi-gradient (the conventional approach) and the residual gradient. We add residual algorithms to a simple and high-scoring but inefficient offline algorithm. Changing nothing else, the residual weight hyperparameter reduces the number of neural networks required by 50x on a standard benchmark domain.










Page template from Jon Barron