NoProp: Training Neural Networks without Back-propagation or Forward-propagation

Abstract

The canonical deep learning approach for learning requires computing agradient term at each layer by back-propagating the error signal from theoutput towards each learnable parameter. Given the stacked structure of neuralnetworks, where each layer builds on the representation of the layer below,this approach leads to hierarchical representations. More abstract featureslive on the top layers of the model, while features on lower layers areexpected to be less abstract. In contrast to this, we introduce a new learningmethod named NoProp, which does not rely on either forward or backwardspropagation. Instead, NoProp takes inspiration from diffusion and flow matchingmethods, where each layer independently learns to denoise a noisy target. Webelieve this work takes a first step towards introducing a new family ofgradient-free learning methods, that does not learn hierarchicalrepresentations -- at least not in the usual sense. NoProp needs to fix therepresentation at each layer beforehand to a noised version of the target,learning a local denoising process that can then be exploited at inference. Wedemonstrate the effectiveness of our method on MNIST, CIFAR-10, and CIFAR-100image classification benchmarks. Our results show that NoProp is a viablelearning algorithm which achieves superior accuracy, is easier to use andcomputationally more efficient compared to other existing back-propagation-freemethods. By departing from the traditional gradient based learning paradigm,NoProp alters how credit assignment is done within the network, enabling moreefficient distributed learning as well as potentially impacting othercharacteristics of the learning process.