Thermodynamic natural gradient descent

Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here, we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method when employing appropriate hardware. We present a new hybrid digital-analog algorithm f