Exploring the Trade-offs in Explainable AI: Accuracy vs Interpretability
Abstract
The increasing deployment of artificial intelligence in high-stakes domains such as healthcare, finance, and public policy has intensified the demand for explainable models. Yet, achieving explainability often introduces a trade-off with predictive accuracy, raising fundamental questions about trust, usability, and fairness in AI-driven decision-making. This paper examines the accuracy-interpretability trade-off in explainable AI (XAI), drawing upon empirical studies, theoretical frameworks, and emerging techniques. Prior work highlights the challenges of balancing stakeholder expectations, while surveys of interpretability methods demonstrate the diversity of approaches ranging from post-hoc explanations to inherently interpretable models.
Recent advancements such as concept embedding models, semantic similarity controllers, and explainable reinforcement learning show promise in mitigating this trade-off by enabling both accuracy and interpretability. In parallel, efforts to unify adversarial robustness and interpretability emphasize the need for secure and transparent models, while initiatives such as OpenXAI provide standardized evaluation frameworks for explanation quality. Beyond technical solutions, legal and ethical perspectives stress the importance of trustworthiness-by-design, compliance with regulatory frameworks like GDPR, and accountability in automated decision-making.
This paper contributes by synthesizing insights across technical, ethical, and legal domains, and by analyzing the conditions under which interpretability can be achieved without significant accuracy loss. It argues that while the trade-off persists, hybrid approaches combining advanced algorithms, standardized evaluation, and human-centered design can help bridge the gap toward transparent, trustworthy, and effective AI systems.
Keywords
Explainable AI, Model Interpretability, Inherently Interpretable Models