Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient

Document
Description

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more robust for this setting, quantization and

analog compute noise are modeled as weight space perturbations to DNNs and an

information theoretic regularization scheme is used to penalize the KL-divergence

between perturbed and unperturbed models. This regularizer has similarities to

both natural gradient descent and knowledge distillation, but has the advantage of

explicitly promoting the network to and a broader minimum that is robust to

weight space perturbations. In addition to the proposed regularization,

KL-divergence is directly minimized using knowledge distillation. Initial validation

on FashionMNIST and CIFAR10 shows that the information theoretic regularizer

and knowledge distillation outperform existing quantization schemes based on the

straight through estimator or L2 constrained quantization.