2023 Australasian Actuarial Education and Research Symposium


Xi Xin

UNSW Sydney

Why you should not trust explanations in machine learning: an example of partial dependence plot


This is joint work with Giles Hooker, Fei Huang

The adoption of artificial intelligence (AI) across industries, including insurance, has led to the widespread use of complex black-box models such as gradient-boosting machines and neural networks. Although these models offer enhanced efficiency and accuracy, their lack of transparency has raised concerns among regulators and consumers. To address this, interpretation methods from the growing field of interpretable machine learning have gained attention for understanding relationships between model inputs and outputs. However, while stakeholders may possess a certain level of understanding regarding the limitations of these explanations, there is often a lack of awareness regarding the inherent vulnerability of these methods. Alongside the development of various interpretation methods, a growing body of literature has emerged advocating against the use of these explanatory approaches due to their unreliability and potential for providing misleading information. This study uncovers the vulnerability of permutation-based interpretation methods, with a particular focus on partial dependence (PD) plots. We highlight how these methods are susceptible to adversarial attacks, specifically demonstrating how PD plots can be manipulated by exploiting the extrapolation behavior of correlated features. Our work contributes to the existing literature by developing an adversarial framework that allows developers to manipulate the outputs of PD plots. This framework assumes that auditors can access the black-box model and examine the dataset without making any modifications.

Copyright © 2023 Victoria University of Wellington. All Rights Reserved.

Log In