Molo Mbasa J., Vadicamo Lucia, Gennaro Claudio, Carlini Emanuele
Knowledge distillation Information dissimilarity measure Decentralized learning
Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.
Source: FUTURE GENERATION COMPUTER SYSTEMS, vol. 176
@article{oai:iris.cnr.it:20.500.14243/560821,
title = {Decentralized edge learning: a comparative study of distillation strategies and dissimilarity measures},
author = {Molo Mbasa J. and Vadicamo Lucia and Gennaro Claudio and Carlini Emanuele},
doi = {10.1016/j.future.2025.108171},
year = {2026}
}National Centre for HPC, Big Data and Quantum Computing
National Centre for HPC, Big Data and Quantum Computing
Sustainable Mobility Center
Sustainable Mobility Center