First and Second Moments
In contrast to what we’ve seen so far, Adam relies on two variables for changing the learning rate. They are $m_t$, $v_t$ and are initialized by zeros at the start. They are updated in the following way
$$
\text{let} \frac{\partial E}{\partial W} = g_t, \text{ At the step t of training}\\
m_t \leftarrow \beta_1 m_{t-1} + (1 - \beta_1)g_t\\
v_t \leftarrow \beta_2 v_{t-1} + (1 - \beta_2)g_t^{2}
$$
They are nothing but the exponentially decaying averages of the first moment (mean) and second moment (the uncentred variance).
Bias correction
Before we use them in our final gradient update, notice that both $m_t$ and $v_t$ are initialized with zeros and are biased towards it. To correct for this the bias, corrected $\hat{m_t}$ and $\hat{v_t}$ are calculated as follows
$$
\hat{m_t} \leftarrow \dfrac{m_t}{1-\beta_1^t}\\
\hat{v_t} \leftarrow \dfrac{v_t}{1-\beta_2^t}
$$
You can see the derivation in section 3 of the original paper. With the bias corrected $\hat{m_t}$ and $\hat{v_t}$ the update formula becomes
$$
W_t \leftarrow W_{t-1} - \eta \dfrac{\hat{m_t}}{\sqrt{\hat{v_t}} + \epsilon}
$$
Where $\eta$ is the learning rate that is set manually and $\epsilon$ is the safeguard to avoid division by zero.
Signal to Noise Ratio
In a rough sense, you can think of $\dfrac{\hat{m_t}}{\sqrt{\hat{v_t}}}$ as the signal to noise ratio(SNR). If the SNR is low, then the gradient update is also low. This is a good thing because a low SNR means a greater uncertainty whether the minima we’re trying to reach is in the current direction. Also when we’re approaching the minima the SNR typically becomes closer to zero. This is also desirable as we want our updates to stop at the minima point.
Recommended parameter values
The authors of the paper (see Algorithm 1) recommend the following default values
$$
\begin{align}
\eta &= 0.001\\
\beta_{1} &= 0.9\\
\beta_{2} &= 0.999\\
\epsilon &= 10^{-8}
\end{align}
$$
In the next post let’s see AdaMax, a variant of Adam presented in the original paper itself. Enjoy your end of the post comic.