Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Stochastic Approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. The most famous examples today are TD- and Q-learning algorithms. This three hour tutorial lecture series, courtesy of the Simon Institute for the Theory of Computing at UC Berkeley, will consist of two parts:

The basics: an overview of stochastic approximation, with a focus on optimizing the rate of convergence. An algorithm that gives the best rate is analogous to the Newton-Raphson algorithm.
Left-overs and review from part 1, and applications to reinforcement learning for discounted-cost optimal control, and optimal stopping problems. Theory from Part 1 leads to the new Zap Q-learning algorithm. Analysis suggests that its transient behavior is a close match to a deterministic Newton-Raphson implementation, and numerical experiments confirm super fast convergence.

Slides for the lectures can be downloaded HERE.

Sign up for the free insideBIGDATA newsletter.

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC