Conference article  Open Access

Black box explanation by learning image exemplars in the latent feature space

Guidotti R., Monreale A., Matwin S., Pedreschi D.

Computer Science - Machine Learning  Image exemplars  Computer Vision and Pattern Recognition (cs.CV)  Explainable AI  FOS: Computer and information sciences  Adversarial autoencoder  Machine Learning (cs.LG)  Computer Science - Computer Vision and Pattern Recognition 

We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by "morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.

Source: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019, pp. 189–205, Wurzburg, Germany, 16-20 September, 2019

1. S. Bach, A. Binder, et al. On pixel-wise explanations for non-linear classi er decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
2. J. Bien et al. Prototype selection for interpretable classi cation. AOAS, 2011.
3. L. Breiman. Random forests. Machine learning, 45(1):5{32, 2001.
4. C. Chen, O. Li, A. Barnett, J. Su, and C. Rudin. This looks like that: deep learning for interpretable image recognition. arXiv:1806.10574, 2018.
5. F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608, 2017.
6. H. J. Escalante, S. Escalera, I. Guyon, et al. Explainable and interpretable models in computer vision and machine learning. Springer, 2018.
7. R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In ICCV, pages 3429{3437, 2017.
8. M. Frixione et al. Prototypes vs exemplars in concept representation. KEOD, 2012.
9. N. Frosst et al. Distilling a neural network into a soft decision tree. arXiv:1711.09784, 2017.
10. I. Goodfellow et al. Generative adversarial nets. In NIPS, 2014.
11. R. Guidotti et al. Local rule-based explanations of black box decision systems. arXiv:1805.10820, 2018.
12. R. Guidotti, A. Monreale, and L. Cariaggi. Investigating neighborhood generation for explanations of image classi ers. In PAKDD, 2019.
13. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, et al. A survey of methods for explaining black box models. ACM CSUR, 51(5):93:1{42, 2018.
14. R. Guidotti and S. Ruggieri. On the stability of interpretable models. IJCNN, 2019.
15. S. Hara et al. Maximally invariant data perturbation as explanation. arXiv:1806.07004, 2018.
16. K. He et al. Deep residual learning for image recognition. In CVPR, 2016.
17. G. Hinton et al. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
18. B. Kim et al. Examples are not enough, learn to criticize! In NIPS, 2016.
19. O. Li, H. Liu, C. Chen, and C. Rudin. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI, 2018.
20. A. Makhzani, J. Shlens, et al. Adversarial autoencoders. arXiv:1511.05644, 2015.
21. D. A. Melis and T. Jaakkola. Towards robust interpretability with self-explaining neural networks. In NIPS, 2018.
22. C. Molnar. Interpretable machine learning. LeanPub, 2018.
23. C. Panigutti, R. Guidotti, A. Monreale, and D. Pedreschi. Explaining multi-label black-box classi ers for health applications. In W3PHIAI, 2019.
24. M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classi er. In KDD, pages 1135{1144. ACM, 2016.
25. A. Shrikumar et al. Not just a black box: Learning important features through propagating activation di erences. arXiv:1605.01713, 2016.
26. N. Siddharth, B. Paige, A. Desmaison, V. de Meent, et al. Inducing interpretable representations with variational autoencoders. arXiv:1611.07492, 2016.
27. K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classi cation models and saliency maps. arXiv:1312.6034, 2013.
28. T. Spinner et al. Towards an interpretable latent space: an intuitive comparison of autoencoders with variational autoencoders. In IEEE VIS, 2018.
29. K. Sun, Z. Zhu, and Z. Lin. Enhancing the robustness of deep neural networks by boundary conditional gan. arXiv:1902.11029, 2019.
30. M. Sundararajan et al. Axiomatic attribution for dnn. In ICML. JMLR, 2017.
31. J. van der Waa et al. Contrastive explanations with local foil trees. arXiv:1806.07470, 2018.
32. J. Xie et al. Image denoising with deep neural networks. In NIPS, 2012.
33. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818{833. Springer, 2014.


Back to previous page
BibTeX entry
	title = {Black box explanation by learning image exemplars in the latent feature space},
	author = {Guidotti R. and Monreale A. and Matwin S. and Pedreschi D.},
	doi = {10.1007/978-3-030-46150-8_12 and 10.48550/arxiv.2002.03746},
	booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019, pp. 189–205, Wurzburg, Germany, 16-20 September, 2019},
	year = {2020}

A European AI On Demand Platform and Ecosystem

Track and Know
Big Data for Mobility Tracking Knowledge Extraction in Urban Areas

PROmoting integrity in the use of RESearch results

SoBigData Research Infrastructure