[Probabilidad-Estadistica-Seminario] Seminario de Probabilidad y Estadística - Martín Zubeldía (ISyE, University of Minnesota, EEUU)

Mie Ago 31 08:00:28 -03 2022

Seminario de Probabilidad y Estadística
----------------------------------------

Título: "Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise"

Expositor: Martín Zubeldía (ISyE, University of Minnesota, EEUU)

Resumen:

In this talk, we study stochastic approximation (SA) algorithms under a
contractive operator with respect to an arbitrary norm. We consider two settings
where the iterates are potentially unbounded: additive sub-Gaussian noise, and
bounded multiplicative noise. We obtain concentration bounds on the convergence
errors, and show that these errors have sub-Gaussian tails. Moreover, our bounds
hold anytime in the sense that the entire sample path lies within a tube of
decaying radius with high probability. To establish these results, we first
bound the Moment Generating Function of the generalized Moreau envelope of the
error, which serves as a Lyapunov function. Then, we construct an exponential
supermartingale and use Ville's maximal inequality to obtain anytime exponential
concentration bounds. To overcome the challenge of having multiplicative noise,
we develop a bootstrapping argument to iteratively improve an initially loose
concentration bound and obtain a much tighter one.

Our results enable us to provide anytime high probability bounds for a large
class of reinforcement learning algorithms. Since a special case of contractive
SA with multiplicative noise is linear SA with bounded, Hurwitz in expectation,
but not almost surely Hurwitz matrices, we establish high probability bounds of
various TD-learning algorithms (such as on-policy TD with linear function
approximation, and off-policy TD) in one shot. To the best of our knowledge,
exponential concentration bounds of off-policy TD-learning have not been
established in the literature due to the challenge of handling such
multiplicative noise. Moreover, we also provide anytime high probability bounds
for the popular Q-learning algorithm.

This is joint work with Zaiwei Chen (Caltech) and Siva Theja Maguluri (Georgia
Tech)
--------------------------------------------------------------------------------
Viernes 2/9 a las 10:30, zoom

Contacto: Alejandro Cholaquidis - acholaquidis en hotmail.com
--------------------------------------------------------------------------------
La charla es únicamente por  zoom

Datos para la reunión    virtual :

https://salavirtual-udelar. zoom
.us/j/81121640094?pwd=SWVsZ1V2TTI5aDZob0NTdXVRVzhVZz09

Página del   seminario  :  https://pye.cmat.edu.uy/  seminario     Página del
grupo:  https://pye.cmat.edu.uy/home

Canal de youtube:  https://www.youtube.com/channel/UCOPZEOrLSAYPz2qCAL-KqMg/abo
ut
--------------------------------------------------------------------------------
Más seminarios en: http://www.cmat.edu.uy/seminarios
------------ próxima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: <http://www.cmat.edu.uy/pipermail/seminario-probabilidad-estadistica/attachments/20220831/5a2ed94d/attachment.html>