Document - A survey of methods for explaining black box models

2019

Journal article Open Access

A survey of methods for explaining black box models

Guidotti R., Monreale A., Ruggieri S., Turini F., Giannotti F., Pedreschi D.

Computer Science - Computers and Society Settore INF/01 - Informatica Explanation Interpretable Machine Learning Computer Science (all) General Computer Science Black Box Computer Science - Learning Transparent Model Computers and Society (cs.CY) FOS: Computer and information sciences Artificial Intelligence (cs.AI) Transparent Models Explanations Interpretability Interpretable Models Theoretical Computer Science Machine Learning (cs.LG) Open The Black Box Computer Science - Artificial Intelligence

In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

Source: ACM computing surveys 51 (2019). doi:10.1145/3236009

Publisher: Association for Computing Machinery,, New York, N.Y. , Stati Uniti d'America

Citations

J. Adebayo and L. Kagal. Iterative orthogonal feature projection for diagnosing bias in black-box models. arXiv preprint arXiv:1611.04967, 2016.
P. Adler, C. Falk, S. A. Friedler, G. Rybeck, C. Scheidegger, B. Smith, and S. Venkatasubramanian. Auditing black-box models for indirect in uence. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 1{10. IEEE, 2016.
R. Agrawal, R. Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487{499, 1994.
Y. A. A. S. Aldeen, M. Salleh, and M. A. Razzaque. A comprehensive review on privacy preserving data mining. SpringerPlus, 4(1):694, 2015.
R. Andrews, J. Diederich, and A. B. Tickle. Survey and critique of techniques for extracting rules from trained arti cial neural networks. Knowledge-based systems, 8(6):373{389, 1995.
M. G. Augasta and T. Kathirvalavakumar. Reverse engineering the neural networks for rule extraction in classi cation problems. Neural processing letters, 35(2):131{150, 2012.
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M A~zller. How to explain individual classi cation decisions. Journal of Machine Learning Research, 11(Jun):1803{1831, 2010.
J. Bien and R. Tibshirani. Prototype selection for interpretable classi cation. The Annals of Applied Statistics, pages 2403{2424, 2011.
O. Boz. Extracting decision trees from trained neural networks. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 456{461. ACM, 2002.
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classi cation and regression trees. CRC press, 1984.
A. Caliskan-Islam, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language corpora necessarily contain human biases. arXiv preprint arXiv:1608.07187, 2016.
C. Carter, E. Renuart, M. Saunders, and C. C. Wu. The credit card market and regulation: In need of repair. NC Banking Inst., 10:23, 2006.
R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721{1730. ACM, 2015.
H. Chipman, E. George, and R. McCulloh. Making sense of a forest of trees. Computing Science and Statistics, pages 84{92, 1998.
G. Comande. Regulating algorithms regulation? rst ethico-legal principles, problems, and opportunities of algorithms. In Transparent Data Mining for Big and Small Data, pages 169{206. Springer, 2017.
P. Cortez and M. J. Embrechts. Opening black box data mining models using sensitivity analysis. In Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on, pages 341{348. IEEE, 2011.
P. Cortez and M. J. Embrechts. Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 225:1{17, 2013.
P. Cortez, J. Teixeira, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Using data mining for wine quality assessment. In Discovery Science, volume 5808, pages 66{79. Springer, 2009.
M. Craven and J. W. Shavlik. Using sampling and queries to extract rules from trained neural networks. In ICML, pages 37{45, 1994.
M. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems, pages 24{30, 1996.
A. Datta, S. Sen, and Y. Zick. Algorithmic transparency via quantitative input in uence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 598{617. IEEE, 2016.
H. Deng. Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456, 2014.
P. Domingos. Knowledge discovery via multiple models. Intelligent Data Analysis, 2(1-4):187{202, 1998.
F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. 2017.
R. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. arXiv preprint arXiv:1704.03296, 2017.
E. Frank and I. H. Witten. Generating accurate rule sets without global optimization. 1998.
A. A. Freitas. Comprehensible classi cation models: a position paper. ACM SIGKDD explorations newsletter, 15(1):1{10, 2014.
G. Fung, S. Sandilya, and R. B. Rao. Rule extraction from linear support vector machines. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 32{40. ACM, 2005.
R. D. Gibbons, G. Hooker, M. D. Finkelman, D. J. Weiss, P. A. Pilkonis, E. Frank, T. Moore, and D. J. Kupfer. The cad-mdd: a computerized adaptive diagnostic screening tool for depression. The Journal of clinical psychiatry, 74(7):669, 2013.
A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44{65, 2015.
B. Goodman and S. Flaxman. Eu regulations on algorithmic decision-making and a right to explanation. In ICML workshop on human interpretability in machine learning (WHI 2016), New York, NY. http://arxiv. org/abs/1606.08813 v1, 2016.
S. Hara and K. Hayashi. Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390, 2016.
A. Henelius, K. Puolamaki, H. Bostrom, L. Asker, and P. Papapetrou. A peek into the black box: exploring classi ers by randomization. Data mining and knowledge discovery, 28(5-6):1503{1529, 2014.
J. M. Hofman, A. Sharma, and D. J. Watts. Prediction and explanation in social systems. Science, 355(6324):486{488, 2017.
G. Hooker. Discovering additive structure in black box functions. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 575{580. ACM, 2004.
J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems, 51(1):141{154, 2011.
U. Johansson, R. Konig, and L. Niklasson. Rule extraction from trained neural networks using genetic programming. In 13th International Conference on Arti cial Neural Networks, pages 13{16, 2003.
U. Johansson, R. Konig, and L. Niklasson. The truth is in there-rule extraction from opaque models using genetic programming. In FLAIRS Conference, pages 658{663, 2004.
U. Johansson and L. Niklasson. Evolving decision trees using oracle guides. In Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE Symposium on, pages 238{244. IEEE, 2009.
U. Johansson, L. Niklasson, and R. Konig. Accuracy vs. comprehensibility in data mining models. In Proceedings of the seventh international conference on information fusion, volume 1, pages 295{300, 2004.
H. Kato and T. Harada. Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 955{962, 2014.
B. Kim, E. Glassman, B. Johnson, and J. Shah. ibcm: Interactive bayesian case model empowering humans via intuitive interaction. 2015.
B. Kim, O. O. Koyejo, and R. Khanna. Examples are not enough, learn to criticize! criticism for interpretability. In Advances In Neural Information Processing Systems, pages 2280{2288, 2016.
B. Kim, C. Rudin, and J. A. Shah. The bayesian case model: A generative approach for case-based reasoning and prototype classi cation. In Advances in Neural Information Processing Systems, pages 1952{1960, 2014.
B. Kim, J. A. Shah, and F. Doshi-Velez. Mind the gap: A generative approach to interpretable feature selection and extraction. In Advances in Neural Information Processing Systems, pages 2260{2268, 2015.
P. W. Koh and P. Liang. Understanding black-box predictions via in uence functions. arXiv preprint arXiv:1703.04730, 2017.
I. Kononenko et al. An e cient explanation of individual classi cations using game theory. Journal of Machine Learning Research, 11(Jan):1{18, 2010.
J. Krause, A. Perer, and K. Ng. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 5686{5697. ACM, 2016.
S. Krening, B. Harrison, K. M. Feigh, C. L. Isbell, M. Riedl, and A. Thomaz. Learning from explanations using sentiment and advice in rl. IEEE Transactions on Cognitive and Developmental Systems, 9(1):44{55, 2017.
R. Krishnan, G. Sivakumar, and P. Bhattacharya. Extracting decision trees from trained neural networks. Pattern recognition, 32(12), 1999.
S. Krishnan and E. Wu. Palm: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, page 4. ACM, 2017.
A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
H. Lakkaraju, S. H. Bach, and J. Leskovec. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1675{1684. ACM, 2016.
H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec. Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154, 2017.
H. Lakkaraju, J. Kleinberg, J. Leskovec, J. Ludwig, and S. Mullainathan. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 275{284. ACM, 2017.
T. Lei, R. Barzilay, and T. Jaakkola. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016.
B. Letham, C. Rudin, T. H. McCormick, D. Madigan, et al. Interpretable classiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3):1350{1371, 2015.
B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi. Deep text classi cation can be fooled. arXiv preprint arXiv:1704.08006, 2017.
Z. C. Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016.
Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classi cation and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150{158. ACM, 2012.
Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623{631. ACM, 2013.
S. Lowry and G. Macpherson. A blot on the profession. British medical journal (Clinical research ed.), 296(6623):657, 1988.
A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5188{5196, 2015.
D. M. Malioutov, K. R. Varshney, A. Emad, and S. Dash. Learning interpretable classi cation rules with boolean compressed sensing. In Transparent Data Mining for Big and Small Data, pages 95{121. Springer, 2017.
D. Martens, B. Baesens, T. Van Gestel, and J. Vanthienen. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research, 183(3):1466{1476, 2007.
D. Martens, J. Vanthienen, W. Verbeke, and B. Baesens. Performance of classi - cation models from a user perspective. Decision Support Systems, 51(4):782{793, 2011.
D. McSherry. Explanation in recommender systems. Arti cial Intelligence Review, 24(2):179{197, 2005.
P. M. Murphy and M. J. Pazzani. Id2-of-3: Constructive induction of m-of-n concepts for discriminators in decision trees. In Proceedings of the eighth international workshop on machine learning, pages 183{187, 1991.
A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High con dence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427{436, 2015.
H. Nun~ez, C. Angulo, and A. Catala. Rule extraction from support vector machines. In Esann, pages 107{112, 2002.
J. D. Olden and D. A. Jackson. Illuminating the black box: a randomization approach for understanding variable contributions in arti cial neural networks. Ecological modelling, 154(1):135{150, 2002.
F. E. Otero and A. A. Freitas. Improving the interpretability of classi cation rules discovered by an ant colony algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pages 73{80. ACM, 2013.
G. L. Pappa, A. J. Baines, and A. A. Freitas. Predicting post-synaptic activity in proteins with data mining. Bioinformatics, 21(suppl 2):ii19{ii25, 2005.
F. Pasquale. The black box society: The secret algorithms that control money and information. Harvard University Press, 2015.
M. J. Pazzani, S. Mani, W. R. Shankle, et al. Acceptance of rules generated by machine learning among medical experts. Methods of information in medicine, 40(5):380{385, 2001.
D. Pedreshi, S. Ruggieri, and F. Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 560{568. ACM, 2008.
J. R. Quinlan. Generating production rules from decision trees. In ijcai, volume 87, pages 304{307, 1987.
J. R. Quinlan. Simplifying decision trees. International journal of man-machine studies, 27(3):221{234, 1987.
J. R. Quinlan. C4. 5: Programs for Machine Learning. Elsevier, 1993.
J. R. Quinlan and R. M. Cameron-Jones. Foil: A midterm report. In European conference on machine learning, pages 1{20. Springer, 1993.
A. Radford, R. Jozefowicz, and I. Sutskever. Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444, 2017.
M. T. Ribeiro, S. Singh, and C. Guestrin. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386, 2016.
M. T. Ribeiro, S. Singh, and C. Guestrin. Nothing else matters: Modelagnostic explanations by identifying prediction invariance. arXiv preprint arXiv:1611.05817, 2016.
M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classi er. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135{1144. ACM, 2016.
A. Romei and S. Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5):582{638, 2014.
A. Saltelli. Sensitivity analysis for importance assessment. Risk analysis, 22(3):579{590, 2002.
V. Schetinin, J. E. Fieldsend, D. Partridge, T. J. Coats, W. J. Krzanowski, R. M. Everson, T. C. Bailey, and A. Hernandez. Con dent interpretation of bayesian decision tree ensembles for clinical applications. IEEE Transactions on Information Technology in Biomedicine, 11(3):312{319, 2007.
C. Seifert, A. Aamir, A. Balagopalan, D. Jain, A. Sharma, S. Grottel, and S. Gumhold. Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data, pages 123{144. Springer, 2017.
R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391, 2016.
R. Shwartz-Ziv and N. Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
G. Su, D. Wei, K. R. Varshney, and D. M. Malioutov. Interpretable two-level boolean rule learning for classi cation. arXiv preprint arXiv:1511.07361, 2015.
M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365, 2017.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
H. F. Tan, G. Hooker, and M. T. Wells. Tree space prototypes: Another look at making tree ensembles interpretable. arXiv preprint arXiv:1611.07115, 2016.
P.-N. Tan et al. Introduction to data mining. Pearson Education India, 2006.
J. J. Thiagarajan, B. Kailkhura, P. Sattigeri, and K. N. Ramamurthy. Treeview: Peeking into deep neural networks via feature-space partitioning. arXiv preprint arXiv:1611.07429, 2016.
G. Tolomei, F. Silvestri, A. Haines, and M. Lalmas. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 465{474. ACM, 2017.
R. Turner. A model explanation system. In Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on, pages 1{6. IEEE, 2016.
W. Verbeke, D. Martens, C. Mues, and B. Baesens. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications, 38(3):2354{2364, 2011.
C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba. Hoggles: Visualizing object detection features. In Proceedings of the IEEE International Conference on Computer Vision, pages 1{8, 2013.
S. Wachter, B. Mittelstadt, and L. Floridi. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International Data Privacy Law, 7(2):76{99, 2017.
F. Wang and C. Rudin. Falling rule lists. In Arti cial Intelligence and Statistics, pages 1013{1022, 2015.
J. Wang, R. Fujimaki, and Y. Motohashi. Trading interpretability for accuracy: Oblique treed sparse additive models. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1245{ 1254. ACM, 2015.
T. Wang, C. Rudin, F. Velez-Doshi, Y. Liu, E. Klamp , and P. MacNeille. Bayesian rule sets for interpretable classi cation. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 1269{1274. IEEE, 2016.
P. Weinzaepfel, H. Jegou, and P. Perez. Reconstructing an image from its local descriptors. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 337{344. IEEE, 2011.
A. Weller. Challenges for transparency. arXiv preprint arXiv:1708.01870, 2017.
D. Wettschereck, D. W. Aha, and T. Mohri. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In Lazy learning, pages 273{314. Springer, 1997.
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pages 2048{2057, 2015.
X. Yin and J. Han. Cpar: Classi cation based on predictive association rules. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 331{335. SIAM, 2003.
J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818{833. Springer, 2014.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2921{2929, 2016.
Y. Zhou and G. Hooker. Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036, 2016.
Z.-H. Zhou, Y. Jiang, and S.-F. Chen. Extracting symbolic rules from trained neural network ensembles. Ai Communications, 16(1):3{15, 2003.

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:it.cnr:prodotti:397163,
	title = {A survey of methods for explaining black box models},
	author = {Guidotti R. and Monreale A. and Ruggieri S. and Turini F. and Giannotti F. and Pedreschi D.},
	publisher = {Association for Computing Machinery,, New York, N.Y. , Stati Uniti d'America},
	doi = {10.1145/3236009 and 10.48550/arxiv.1802.01933},
	journal = {ACM computing surveys},
	volume = {51},
	year = {2019}
}