Hardware-Aware Bayesian Neural Architecture Search of Quantized CNNs

authors

  • Perrin Mathieu
  • Guicquero William
  • Paille Bruno
  • Sicard Gilles

keywords

  • Neural Architecture Search
  • Quantization
  • Hardware Aware
  • Bayesian Optimization

abstract

Advances in Neural Architecture Search (NAS) now provide a crucial assistance to design Hardware-efficient neural networks. This paper presents NAS for resource-efficient, weight-quantized Convolutional Neural Networks (CNNs), under computational complexity constraints (model size and number of computations). Bayesian Optimization is used to efficiently search for traceable CNN architectures within a continuous embedding space. This embedding is the latent space of a neural architecture autoencoder, regularized with a Maximum Mean Discrepancy penalization and a convex latent predictor of parameters. On CIFAR-100, and without quantization, we obtain 75% test accuracy with less than 2.5M parameters and 600M operations. NAS experiments on STL-10 with 32, 8 and 4 bit weights outperform some high-end architectures while enabling drastic model size reduction (6Mb to 840kb). It demonstrates our method’s ability to discover lightweight and high performing models, while showcasing the importance of quantization to improve the tradeoff between accuracy and model size.

more information