Neural network models have been growing larger and larger rapidly over the decades. Meanwhile, larger models also cause more difficulties when deployed on edge devices. Recent research proved that neural networks can also work at lower precision, even 1-bit. Therefore, we will briefly introduce the following three topics in this presentation:
1. Why must we explore ultra-low precision quantization technology for neural networks on edge devices?
2. What is the Quantized Neural Network? How does it work in different precision and models?
3. How can we train and implement the Quantized Neural Network as a hardware accelerator on FPGA/ASIC?
Chair of Processor Design