Accepted in principle to Nature, June 2026

Explainable deep learning improves human mental models of self-driving cars

Eoin M. Kenny^1*, Akshay Dharmavaram², Sang Uk Lee², Tung Phan-Minh², Shreyas Rajesh², Yunqing Hu², Laura Major², Momchil S. Tomov^2,3†, Julie A. Shah^1,4†

¹ CSAIL, Massachusetts Institute of Technology | ² Motional AD Inc. | ³ Harvard University | ⁴ Dept. of Aeronautics and Astronautics, MIT

^† These authors contributed equally to this work.

arXiv Paper Nature Paper — Coming Soon GitHub Repository Dataset Deployment Tests SAGAT & Full Video Library

Interpretable-by-Design Architecture

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. However, the opacity of these black-box planners makes it challenging to anticipate their failures. To address this, we introduce the Concept-Wrapper Network (CW-Net).

CW-Net acts as a method for faithfully explaining the behavior of machine-learning-based planners by causally grounding their reasoning in human-interpretable concepts. We replace the final reward layer of a pretrained deep neural network with a concept classifier, generating decisions directly from interpretable scenarios like "Approaching stopped vehicle" or "Close to cyclist."

Figure 1: CW-Net architecture grounding black-box reasoning in human-friendly concepts.

Maintaining Expert Performance

A frequent critique of inherently interpretable models is the perceived trade-off between transparency and task performance. We evaluated CW-Net on the large-scale nuPlan benchmark using closed-loop simulations to verify driving capability.

The results demonstrate that CW-Net classifies concepts and provides causally faithful explanations without compromising the driving behavior of the original system. Our model exhibited less than a 1% difference across all safety, progress, and trajectory deviation metrics compared to the opaque baseline agent.

Surprising Situations in Real-World Deployment

To study the practical utility of CW-Net, we deployed the system on a real self-driving car with a safety driver in a semi-naturalistic study. The following setup and videos capture naturally occurring, surprising events where the explanations actively refined the driver's mental model of the vehicle's decision-making process.

Figure 2: Real-world deployment setup and CW-Net graphical user interface.

Approaching Stopped Vehicle (ASV) Hallucination

Proximity to Other Vehicles (CLOSE)

Cyclist Detection Limitations (BIKE)

Mental Model & Prediction Correlation

To validate that the improvements observed in real-world deployment replicate in larger populations, we conducted online studies with both expert test engineers and non-experts. We utilized nearest-neighbor and counterfactual prediction tasks to directly probe user mental models.

Our findings indicate that CW-Net explanations consistently shift user beliefs toward the ground-truth reasons for autonomous behavior. Crucially, as the "goodness" of a user's mental model improved, their ability to accurately predict the vehicle's future actions increased significantly.

Extended Data Fig. 5: Mental model improvement correlates with prediction accuracy.

SAGAT: Pragmatic Usage and Situational Awareness

As a final, rigorous evaluation, we collected naturalistic scenarios from public roads in Las Vegas and deployed a large-scale online study using the Situation Awareness Global Assessment Technique (SAGAT).

The results provide definitive evidence of pragmatic utility: in surprising, out-of-distribution events, CW-Net explanations yielded large effect-size improvements in perception, comprehension, and projection. Conversely, the explanations did not degrade situational awareness during routine, unsurprising events. Below, each private-track deployment anomaly (left) is paired with its corresponding Las Vegas SAGAT surprising scenario (right), alongside key concept-activation analyses.

Approaching Stopped Vehicle (ASV)

Private Track Deployment

Las Vegas SAGAT — Surprising Scenario 2

ASV Hallucination — Private Track

ASV Encounter — Las Vegas Public Road

ASV concept activation and SAGAT deployment analysis.

Proximity to Other Vehicles (CLOSE)

Private Track Deployment

Las Vegas SAGAT — Surprising Scenario 1

CLOSE Proximity — Private Track

CLOSE Encounter — Las Vegas Public Road

CLOSE concept activation and SAGAT deployment analysis.

Cyclist Detection (BIKE)

Private Track Deployment

Las Vegas SAGAT — Surprising Scenario 2

BIKE Detection Failure — Private Track

BIKE Encounter — Las Vegas Public Road

BIKE concept activation and SAGAT deployment analysis.

View the Full SAGAT Video Library

Citation

If you find this work useful in your research, please consider citing:

@article{kenny2026explainable, title={Explainable deep learning improves human mental models of self-driving cars}, author={Kenny, Eoin M. and Dharmavaram, Akshay and Lee, Sang Uk and Phan-Minh, Tung and Rajesh, Shreyas and Hu, Yunqing and Major, Laura and Tomov, Momchil S. and Shah, Julie A.}, journal={Nature}, year={2026} }