Optimizing Algorithms - Recap and Resources

Till now we’ve seen the following optimizations for gradient descent.

  1. Nesterov Accelerated Gradient
  2. AdaGrad
  3. AdaDelta
  4. RMSProp
  5. Adam
  6. Adamax
  7. Nadam

The structure and order have been taken from Ruder’s excellent article on optimizing gradient descent. You can also see comparative analysis in this article. To understand momentum on a deeper level read this article on distill.

Starting next week, let’s start writing our own deep learning library. Not for the purpose of re-inventing the wheel, but to test our own understanding and build a better intuition. Also, parallel to it, let’s start dealing with image data.

XKCD Comic