Massachusetts Institute of Technology Motional
Accepted in principle to Nature, June 2026

Explainable deep learning improves human mental models of self-driving cars

Eoin M. Kenny1*, Akshay Dharmavaram2, Sang Uk Lee2, Tung Phan-Minh2, Shreyas Rajesh2, Yunqing Hu2, Laura Major2, Momchil S. Tomov2,3†, Julie A. Shah1,4†
1 CSAIL, Massachusetts Institute of Technology | 2 Motional AD Inc. | 3 Harvard University | 4 Dept. of Aeronautics and Astronautics, MIT
These authors contributed equally to this work.

Interpretable-by-Design Architecture

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. However, the opacity of these black-box planners makes it challenging to anticipate their failures. To address this, we introduce the Concept-Wrapper Network (CW-Net).

CW-Net acts as a method for faithfully explaining the behavior of machine-learning-based planners by causally grounding their reasoning in human-interpretable concepts. We replace the final reward layer of a pretrained deep neural network with a concept classifier, generating decisions directly from interpretable scenarios like "Approaching stopped vehicle" or "Close to cyclist."

CW-Net Architecture
Figure 1: CW-Net architecture grounding black-box reasoning in human-friendly concepts.

Maintaining Expert Performance

A frequent critique of inherently interpretable models is the perceived trade-off between transparency and task performance. We evaluated CW-Net on the large-scale nuPlan benchmark using closed-loop simulations to verify driving capability.

The results demonstrate that CW-Net classifies concepts and provides causally faithful explanations without compromising the driving behavior of the original system. Our model exhibited less than a 1% difference across all safety, progress, and trajectory deviation metrics compared to the opaque baseline agent.

Surprising Situations in Real-World Deployment

To study the practical utility of CW-Net, we deployed the system on a real self-driving car with a safety driver in a semi-naturalistic study. The following setup and videos capture naturally occurring, surprising events where the explanations actively refined the driver's mental model of the vehicle's decision-making process.

Deployment Setup
Figure 2: Real-world deployment setup and CW-Net graphical user interface.
Approaching Stopped Vehicle (ASV) Hallucination
Proximity to Other Vehicles (CLOSE)
Cyclist Detection Limitations (BIKE)

Mental Model & Prediction Correlation

To validate that the improvements observed in real-world deployment replicate in larger populations, we conducted online studies with both expert test engineers and non-experts. We utilized nearest-neighbor and counterfactual prediction tasks to directly probe user mental models.

Our findings indicate that CW-Net explanations consistently shift user beliefs toward the ground-truth reasons for autonomous behavior. Crucially, as the "goodness" of a user's mental model improved, their ability to accurately predict the vehicle's future actions increased significantly.

Prediction Correlation
Extended Data Fig. 5: Mental model improvement correlates with prediction accuracy.

SAGAT: Pragmatic Usage and Situational Awareness

As a final, rigorous evaluation, we collected naturalistic scenarios from public roads in Las Vegas and deployed a large-scale online study using the Situation Awareness Global Assessment Technique (SAGAT).

The results provide definitive evidence of pragmatic utility: in surprising, out-of-distribution events, CW-Net explanations yielded large effect-size improvements in perception, comprehension, and projection. Conversely, the explanations did not degrade situational awareness during routine, unsurprising events. Below, each private-track deployment anomaly (left) is paired with its corresponding Las Vegas SAGAT surprising scenario (right), alongside key concept-activation analyses.

Approaching Stopped Vehicle (ASV)

Private Track Deployment
Las Vegas SAGAT — Surprising Scenario 2
ASV Hallucination — Private Track
ASV Encounter — Las Vegas Public Road
ASV Deployment Analysis
ASV concept activation and SAGAT deployment analysis.

Proximity to Other Vehicles (CLOSE)

Private Track Deployment
Las Vegas SAGAT — Surprising Scenario 1
CLOSE Proximity — Private Track
CLOSE Encounter — Las Vegas Public Road
CLOSE Deployment Analysis
CLOSE concept activation and SAGAT deployment analysis.

Cyclist Detection (BIKE)

Private Track Deployment
Las Vegas SAGAT — Surprising Scenario 2
BIKE Detection Failure — Private Track
BIKE Encounter — Las Vegas Public Road
BIKE Deployment Analysis
BIKE concept activation and SAGAT deployment analysis.

View the Full SAGAT Video Library

Citation

If you find this work useful in your research, please consider citing:

@article{kenny2026explainable, title={Explainable deep learning improves human mental models of self-driving cars}, author={Kenny, Eoin M. and Dharmavaram, Akshay and Lee, Sang Uk and Phan-Minh, Tung and Rajesh, Shreyas and Hu, Yunqing and Major, Laura and Tomov, Momchil S. and Shah, Julie A.}, journal={Nature}, year={2026} }