I always wondered why the authors don’t/can’t explain the concepts to the dummy audience who just know basic math and good coding.

If the authors could make it simpler deep learning and ml will have an impact in the size of open source movement. The current tutorials and the articles only for the academia.

]]>By the way, is everyone using (steepest) SGD? Can’t we get away with just gradient-related directions?

]]>Can you mention some of these alternative methods that Deep Learning is outperforming? For example, do you know what are the close competitors to Deep Learning in its main application domains (speech recognition and vision I suppose)?

It would be interesting to know if the alternatives are theoretically better understood or not.

]]>In the post, is the choice of 1/2 as the dropout rate an arbitrary constant — or was there more to this? Is dropout probability fixed or is it optimized and changed through the course of the algorithm? If the latter is the case, is there a procedure for the algorithm to realize that it is probably overfitting enabling it to remedy this by increasing the dropout probability?

]]>