Abstract: Existing quantization approaches often suffer from significant accuracy degradation when compressing hybrid convolution and transformer models with low bit-width. This paper presents ...
Abstract: As deep learning models are increasingly being deployed on resource-constrained edge devices, the need to develop techniques to make the model more energy efficient without sacrificing its ...