pymc.adagrad#
- pymc.adagrad(loss_or_grads=None, params=None, learning_rate=1.0, epsilon=1e-06)[source]#
- Adagrad updates. - Scale learning rates by dividing with the square root of accumulated squared gradients. See [1] for further description. - Parameters:
- loss_or_grads: symbolic expression or list of expressions
- A scalar loss expression, or a list of gradient expressions 
- params: list of shared variables
- The variables to generate update expressions for 
- learning_rate: float or symbolic scalar
- The learning rate controlling the size of update steps 
- epsilon: float or symbolic scalar
- Small value added for numerical stability 
 
- Returns:
- OrderedDict
- A dictionary mapping each parameter to its update expression 
 
 - Notes - Using step size eta Adagrad calculates the learning rate for feature i at time step t as: \[\begin{split}\\eta_{t,i} = \\frac{\\eta} {\\sqrt{\\sum^t_{t^\\prime} g^2_{t^\\prime,i}+\\epsilon}} g_{t,i}\end{split}\]- as such the learning rate is monotonically decreasing. - Epsilon is not included in the typical formula, see [2]. - Optimizer can be called without both loss_or_grads and params in that case partial function is returned - References [1]- Duchi, J., Hazan, E., & Singer, Y. (2011): Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:2121-2159. [2]- Chris Dyer: Notes on AdaGrad. http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf - Examples - >>> a = pytensor.shared(1.0) >>> b = a * 2 >>> updates = adagrad(b, [a], learning_rate=0.01) >>> isinstance(updates, dict) True >>> optimizer = adagrad(learning_rate=0.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True