Optimization

References

http://videolectures.net/deeplearning2015_goodfellow_network_optimization/ (Ian Goodfellow's tutorial on neural network optimization at Deep Learning Summer School 2015).
http://int8.io/comparison-of-optimization-techniques-stochastic-gradient-descent-momentum-adagrad-and-adadelta (implementation and comparison of popular methods)
http://www.deeplearningbook.org/contents/numerical.html (basic intro in 4.3)
http://www.deeplearningbook.org/contents/optimization.html (8.1 generalization, 8.2 problems, 8.3 algorithms, 8.4 init, 8.5 adaptive lr, 8.6 approx 2nd order, 8.7 meta)
http://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/ (great posts on optimization)
https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf (excellent tutorial on cg, gd, eigens etc)
http://arxiv.org/abs/1412.6544 (Goodfellow paper)
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec6.pdf (hinton slides)
https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides/lec8.pdf (hinton slides)
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf
http://arxiv.org/abs/1503.05671
http://arxiv.org/abs/1412.1193
http://www.springer.com/us/book/9780387303031 (nocedal and wright)
http://www.nrbook.com (numerical recipes)
https://maths-people.anu.edu.au/~brent/pub/pub011.html (without derivatives)
http://stanford.edu/~boyd/cvxbook/ (only convex optimization)