[Probabilidad-Estadistica-Seminario] Seminario de Probabilidad y Estadística - Martín Zubeldía (ISyE, University of Minnesota, EEUU)
seminarios en cmat.edu.uy
seminarios en cmat.edu.uy
Mie Ago 31 08:00:28 -03 2022
Seminario de Probabilidad y Estadística
----------------------------------------
Título: "Anytime exponential concentration of contractive stochastic approximation: Additive and multiplicative noise"
Expositor: Martín Zubeldía (ISyE, University of Minnesota, EEUU)
Resumen:
In this talk, we study stochastic approximation (SA) algorithms under a
contractive operator with respect to an arbitrary norm. We consider two settings
where the iterates are potentially unbounded: additive sub-Gaussian noise, and
bounded multiplicative noise. We obtain concentration bounds on the convergence
errors, and show that these errors have sub-Gaussian tails. Moreover, our bounds
hold anytime in the sense that the entire sample path lies within a tube of
decaying radius with high probability. To establish these results, we first
bound the Moment Generating Function of the generalized Moreau envelope of the
error, which serves as a Lyapunov function. Then, we construct an exponential
supermartingale and use Ville's maximal inequality to obtain anytime exponential
concentration bounds. To overcome the challenge of having multiplicative noise,
we develop a bootstrapping argument to iteratively improve an initially loose
concentration bound and obtain a much tighter one.
Our results enable us to provide anytime high probability bounds for a large
class of reinforcement learning algorithms. Since a special case of contractive
SA with multiplicative noise is linear SA with bounded, Hurwitz in expectation,
but not almost surely Hurwitz matrices, we establish high probability bounds of
various TD-learning algorithms (such as on-policy TD with linear function
approximation, and off-policy TD) in one shot. To the best of our knowledge,
exponential concentration bounds of off-policy TD-learning have not been
established in the literature due to the challenge of handling such
multiplicative noise. Moreover, we also provide anytime high probability bounds
for the popular Q-learning algorithm.
This is joint work with Zaiwei Chen (Caltech) and Siva Theja Maguluri (Georgia
Tech)
--------------------------------------------------------------------------------
Viernes 2/9 a las 10:30, zoom
Contacto: Alejandro Cholaquidis - acholaquidis en hotmail.com
--------------------------------------------------------------------------------
La charla es únicamente por zoom
Datos para la reunión virtual :
https://salavirtual-udelar. zoom
.us/j/81121640094?pwd=SWVsZ1V2TTI5aDZob0NTdXVRVzhVZz09
Página del seminario : https://pye.cmat.edu.uy/ seminario Página del
grupo: https://pye.cmat.edu.uy/home
Canal de youtube: https://www.youtube.com/channel/UCOPZEOrLSAYPz2qCAL-KqMg/abo
ut
--------------------------------------------------------------------------------
Más seminarios en: http://www.cmat.edu.uy/seminarios
------------ próxima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: <http://www.cmat.edu.uy/pipermail/seminario-probabilidad-estadistica/attachments/20220831/5a2ed94d/attachment.html>
Más información sobre la lista de distribución seminario-probabilidad-estadistica