Optimization
References
<http://videolectures.net/deeplearning2015_goodfellow_network_optimization/> (Ian Goodfellow's tutorial on neural network optimization at Deep Learning Summer School 2015).
<http://int8.io/comparison-of-optimization-techniques-stochastic-gradient-descent-momentum-adagrad-and-adadelta> (implementation and comparison of popular methods)
<http://www.deeplearningbook.org/contents/numerical.html> (basic intro in 4.3)
<http://www.deeplearningbook.org/contents/optimization.html> (8.1 generalization, 8.2 problems, 8.3 algorithms, 8.4 init, 8.5 adaptive lr, 8.6 approx 2nd order, 8.7 meta)
<http://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/> (great posts on optimization)
<https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf> (excellent tutorial on cg, gd, eigens etc)
<http://arxiv.org/abs/1412.6544> (Goodfellow paper)
<https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec6.pdf> (hinton slides)
<https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec8.pdf> (hinton slides)
<http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html>
<http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf>
<http://arxiv.org/abs/1503.05671>
<http://arxiv.org/abs/1412.1193>
<http://www.springer.com/us/book/9780387303031> (nocedal and wright)
<http://www.nrbook.com> (numerical recipes)
<https://maths-people.anu.edu.au/~brent/pub/pub011.html> (without derivatives)
<http://stanford.edu/~boyd/cvxbook/> (only convex optimization)