Explainability and Interpretability: When and How to Apply Each

FlowRidge

Definition

Explainability and interpretability are two related but distinct properties of artificial intelligence (AI) systems that allow humans to understand how a model arrives at its outputs. Interpretability refers to the intrinsic transparency of a model — the degree to which a human can examine the model’s structure and parameters and understand its decision logic directly. Explainability refers to post-hoc techniques that generate human-understandable accounts of a model’s behavior even when the model itself is opaque. The distinction matters because the two properties demand different design choices, different toolchains, and different organizational disciplines. This article gives practitioners the decision framework for selecting between them.

Why Both Concepts Matter

The recurring confusion between explainability and interpretability obscures a real engineering choice. A linear regression model is interpretable: the coefficients can be inspected, and a domain expert can confirm whether they make sense. A deep neural network is not interpretable in the same way — its hundreds of millions of parameters do not yield to inspection — but it can be made explainable through techniques that produce post-hoc accounts of why a particular input produced a particular output.

The choice between an interpretable model and an explainable model is not always free. For some problems, interpretable models are sufficient and even superior. For others — image recognition, natural language understanding, complex tabular problems with non-linear interactions — interpretable model families simply do not achieve adequate accuracy, and the choice becomes between an opaque model with explanations attached and no model at all.

A well-known critique by Cynthia Rudin (Nature Machine Intelligence, 2019) argues that for high-stakes decisions, organizations should use intrinsically interpretable models wherever possible and should resist the temptation to deploy opaque models with bolted-on explanations that may themselves be unfaithful to the underlying decision logic. The argument has been influential in policy debates about high-risk AI in healthcare, criminal justice, and lending.

The Interpretability Spectrum

Models can be ranked along a spectrum of intrinsic interpretability.

Highly interpretable. Linear and logistic regression, single decision trees of modest depth, generalized additive models (GAMs), rule lists, and scoring systems. These models can be presented to a domain expert as a small number of weights or a short rule set, and the expert can verify or contest each component.

Moderately interpretable. Random forests of modest size, gradient-boosted trees with global feature importance summaries, and shallow neural networks. The full model is too large to inspect end-to-end, but feature importance, partial dependence plots, and tree paths give domain experts substantial insight.

Low interpretability. Deep neural networks, large transformer models, and large ensembles. The model’s behavior must be characterized through external probing rather than internal inspection.

The choice point is the interaction between accuracy requirements and the consequences of error. A retail product recommendation can use the most accurate model available because the cost of an individual error is low. A consumer credit decision must satisfy regulatory explainability requirements (the US Equal Credit Opportunity Act requires lenders to provide adverse action notices that explain the decision), which often means using a more interpretable model even at some accuracy cost.

Post-Hoc Explainability Techniques

When opacity is unavoidable, three families of post-hoc techniques produce explanations.

Feature attribution methods. SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) compute a numeric contribution for each input feature to a particular prediction. SHAP, based on cooperative game theory, satisfies several desirable mathematical properties (efficiency, symmetry, dummy, additivity) that LIME does not, and it has become the de facto industry standard for tabular and structured data. Both techniques are widely implemented in open-source libraries (the shap and lime Python packages) and are routinely required in regulated industries.

Counterfactual explanations. Rather than explaining what features drove a decision, counterfactual methods generate the smallest change to the input that would have produced a different output. “Your loan would have been approved if your annual income had been $4,000 higher and your credit utilization had been 10 percentage points lower.” Counterfactuals are particularly useful for affected individuals because they translate the model’s behavior into actionable advice. The approach was popularized by Wachter, Mittelstadt, and Russell in 2018 and has been adopted as a recommended explanation type by the UK Information Commissioner’s Office.

Surrogate models. A simpler interpretable model is fit to mimic the predictions of the complex model on a relevant region of input space. The surrogate is then used to explain the complex model’s behavior in that region. Surrogate explanations can be misleading if the surrogate’s fit is imperfect, and best practice requires reporting the fidelity of the surrogate alongside its explanations.

A critical subtlety: post-hoc explanations describe what the model does, not necessarily why it does it. An explanation that says a loan was denied “because of late payments” may correctly identify the feature with the highest SHAP value while obscuring the fact that the model relies on a proxy for race that happens to correlate with late payments. Explanations are necessary but not sufficient — they should be paired with the bias detection and mitigation work described in Article 3.

Audience-Driven Explanation Design

Different audiences need different explanations. A common failure mode is producing one explanation type and serving it to all stakeholders.

Affected individuals need explanations that are concise, in plain language, and actionable. A credit applicant denied a loan does not benefit from a SHAP plot; they benefit from a short statement of the top two or three reasons and, where possible, a counterfactual that suggests what would change the outcome. The EU General Data Protection Regulation Article 22 requires this kind of explanation for fully automated decisions with significant effects.

Domain experts (a clinician using a diagnostic decision support tool, an underwriter reviewing a fraud alert) need explanations that integrate with their existing reasoning. Feature attributions, counterfactuals, and case-based explanations (“here are three similar past cases and how they resolved”) are typically more useful than summary statements.

Regulators and auditors need explanations that document the model’s behavior across the full input distribution, not just on individual cases. Global feature importance, partial dependence plots, fairness metrics across subgroups, and stability analyses under distribution shift are the typical evidence types.

Internal governance bodies (the ethics review board described in Article 7) need explanations that support a go/no-go decision. They typically want to see what the model relies on, where it is uncertain, where it has been most wrong in testing, and what the worst-case behaviors look like.

A complete explainability program designs all four explanation types from the start, not just whichever one happens to be easiest given the chosen technique.

Regulatory Requirements

Explainability is increasingly mandated. The EU AI Act requires high-risk systems to be designed for “appropriate transparency” and to provide instructions for use that allow deployers to interpret the system’s output. Article 13 of the Act spells out the documentation requirements in detail. The EU HLEG Ethics Guidelines for Trustworthy AI list transparency and explicability as a core requirement; see https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.

In the US, sector-specific rules already impose explainability obligations: the Equal Credit Opportunity Act for credit, the Fair Credit Reporting Act for consumer reports, and various state insurance regulations. The Algorithmic Accountability Act introduced in Congress (H.R. 5628) would extend such requirements broadly across “augmented critical decision processes”; see https://www.congress.gov/bill/118th-congress/house-bill/5628.

Internationally, the OECD AI Principles include “transparency and explainability” as one of the five values-based principles; see https://oecd.ai/en/ai-principles. The Singapore IMDA Model AI Governance Framework provides operational guidance scaled by use-case risk; see https://www.pdpc.gov.sg/help-and-resources/2020/01/model-ai-governance-framework.

When Explanations Are Insufficient

Two situations call for going beyond explanations to either redesign or refusal.

The first is when explanations cannot be produced at the fidelity required. For some opaque systems — large language models being a current example — the relationship between inputs and outputs is so complex that even SHAP and LIME explanations may be unstable across runs or unfaithful to the model’s actual reasoning. In high-stakes use cases, “we cannot reliably explain why the model produced this output” is a legitimate ground for declining to deploy.

The second is when an explanation, even if accurate, would be insufficient for the affected individual to challenge or contest the decision. The right to meaningful contestation, recognized in the EU AI Act and in several international human rights frameworks, requires more than a feature attribution. It requires that the affected individual be able to introduce new information, request human review, and receive a substantive response.

Maturity Indicators

Level 1: No explanations are produced for any model.
Level 2: Some models produce feature importance summaries; explanations are technical and audience-undifferentiated.
Level 3: High-risk models produce audience-appropriate explanations (affected individual, domain expert, regulator). Explanation type and fidelity are documented in the model card.
Level 4: Explanations are produced automatically for every consequential decision and stored as part of the audit record. Counterfactual explanations are available for adverse decisions. Explanation fidelity is monitored over time.
Level 5: Explanation quality is reported externally; the organization contributes to explainability standards; explanation generation is part of the product surface that customers explicitly evaluate.

Practical Application

Three steps to start. First, classify each production model on the interpretability spectrum (high, moderate, low) and on the consequences of an individual decision (low, medium, high). Models in the “low interpretability + high consequences” cell are the priority for explanation work. Second, deploy SHAP for tabular models and counterfactual explanations for the highest-stakes binary decisions; both have mature open-source implementations. Third, establish a process for capturing and storing the explanation that accompanied every consequential automated decision, so that explanations exist if and when an audit, complaint, or regulatory inquiry arrives.

Looking Ahead

Article 5 takes up the related but distinct topic of human oversight — the people, processes, and authorities through which an organization keeps meaningful control over its AI systems. Explanations make oversight possible; oversight is what makes explanations matter.