AIR-FM: Assessing and Improving Reliability of Foundation Models in the Real World Workshop@AAAI 2026

Call for Papers

Overview and Scope:

Despite remarkable advances in capability, foundation models such as LLMs and VLMs face fundamental challenges in maintaining reliability under real-world conditions. Their stochastic nature and sensitivity to context make them vulnerable to distribution shifts, sensor noise, hallucinations, overconfidence, and prompt variability. These issues limit safe deployment in critical domains like healthcare, law, robotics, and autonomous driving.

This workshop will serve as a forum for researchers and practitioners to discuss definitions, metrics, and methods for reliability quantification, explore principled evaluation frameworks, and propose strategies to enhance robustness and trustworthiness across language and vision tasks. By bridging the LLM and VLM communities, we aim to foster cross-domain insights, stimulate the creation of realistic stress-test datasets, and encourage approaches that ensure dependable performance in operational settings.

OpenReview submission: https://openreview.net/group?id=AAAI.org/2026/Workshop/AIR-FM

Topic of Interest:

We welcome original contributions from probabilistic machine learning, statistics, engineering, NLP, HCI, and related fields. Submissions may address (but are not limited to) the following topics:

  • Failure Mode Analysis: Characterising unreliability in LLMs and VLMs under real-world conditions, including domain shifts, adversarial inputs, and sensor degradations.

  • Reliability-Centered Datasets: Designing datasets to expose vulnerabilities, long-tail phenomena, or multi-modal inconsistencies.

  • Metrics and Evaluation Frameworks: Developing measures that capture robustness, calibration, and generalization beyond accuracy or average precision.

  • Reliability-Aware Architectures and Training: Model designs and learning paradigms that explicitly target dependable performance in realistic scenarios.

  • Uncertainty Estimation and Detection: Predicting, detecting, and mitigating unreliable outputs before deployment.

  • Security, Hallucination, and Prompt Sensitivity: Red-teaming, jailbreak detection, and methods to reduce context-driven unreliability.

  • Cross-Domain Reliability Insights: Lessons and techniques transferable between language-only, vision-only, and multi-modal systems.

Submission Details:

All submissions are to be made in the AAAI 2026 Format.

Please follow the author guidelines set by AAAI 2026.

The submissions can be done in the following three tracks:

  • Workshop Paper Track: for submitting up to 4-page AAAI format papers of novel work.
  • Nectar Track: for discussing already accepted works at this workshop.
  • Tutorial Track: for discussing demos or tutorials on topics/tools related to the workshop.

Please select the appropriate track for submission.

Important Dates (Tentative):

  • Submission Deadline: October 15, 2025
  • Notification of Acceptance: November 5, 2025
  • Camera-Ready Deadline: December 10, 2025
  • Workshop Date: Monday, January 26, 2026.

For questions or inquiries, please contact us at: air-fm@googlegroups.com