Quantitative strategies of different loss functions aggregation for knowledge distillation

Huong-Giang Doan, Ngoc-Trung Nguyen

Abstract


Deep learning models have been successfully applied to many visual tasks. However, they tend to be increasingly cumbersome due to their high computational complexity and large storage requirements. How to compress convolutional neural network (CNN) models while still maintain their efficiency has received increasing attention from the community, and knowledge distillation (KD) is efficient way to do this. Existing KD methods have focused on the selection of good teachers from multiple teachers, or KD layers, which is cumbersome, expensive computationally, and requires large neural networks for individual models. Most of teacher and student modules are CNN-based networks. In addition, recent proposed KD methods have utilized cross entropy (CE) loss function at student network and KD network. This research focuses on the quantifiable evaluation of teacher-student model, in which knowledge is not only distilled from training models that have the same CNN architecture but also from different architectures. Furthermore, we propose combination of CE, balance cross entropy (BCE), and focal loss functions to not only soften the value of loss function in transferring knowledge from large teacher model to small student model but also increase classification performance. The proposed solution is evaluated on four benchmark static image datasets, and the experimental results show that our proposed solution outperforms the state-of-the-art (SOTA) methods from 2.67% to 9.84% at top 1 accuracy.


Keywords


Convolution neuron network; Deep learning; Knowledge distillation; Student-teacher model; Transfer learning

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v13.i3.pp3240-3249

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938 
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

View IJAI Stats