This site is part of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

Informa logo

Session Summary

June 2026

Explainable AI: The Key to Trustworthy Cyber Defense

Session hosted by: Francesco Leofante Assistant Professor, Imperial College London

As artificial intelligence continues to reshape industries from healthcare to finance and energy, one critical question remains: can we trust the decisions AI systems make? 

In 2026, this question has never been more urgent. With AI adoption accelerating across high-stakes sectors, the opacity of these systems—often described as "black boxes"—has emerged as a fundamental barrier to widespread implementation. This session summary explores the transformative role of explainable AI (XAI) in building trust, enhancing security, and enabling organisations to deploy AI systems with confidence.

Explainable AI: The Key to Trustworthy Cyber Defense

Full Session Summary

The Trust Deficit: Why AI Opacity Threatens Adoption

In sectors where decisions carry life-or-death consequences—such as medical diagnosis, financial risk assessment, or energy infrastructure management—stakeholders demand more than accurate predictions. They require comprehensible reasoning that can be audited, validated, and defended. The speaker emphasised that without explainability, organisations face mounting challenges:

  • Regulatory compliance barriers in industries governed by strict accountability standards.
  • User resistance stemming from inability to verify AI recommendations.
  • Security vulnerabilities that adversaries can exploit through the system's opacity.
  • Liability concerns when AI-driven decisions lead to adverse outcomes.

This trust deficit represents more than a technical challenge; it's a fundamental obstacle to realising AI's transformative potential in critical applications.

XAI Techniques: Illuminating the Black Box

The session detailed three powerful approaches to making AI systems more transparent and trustworthy:

Feature Attribution: Identifying Decision Drivers

Feature attribution techniques reveal which input factors most significantly influence an AI system's output. By highlighting the most influential elements in a decision, these methods enable users to identify potentially harmful triggers—particularly valuable for detecting prompt injection attacks where malicious inputs attempt to manipulate system behaviour. This visibility transforms AI from an inscrutable oracle into an auditable decision-support tool.

Counterfactual Reasoning: Understanding Alternative Outcomes

Counterfactual reasoning systematically varies inputs to determine which elements drive specific outputs. This "what-if" analysis approach identifies dangerous patterns in inputs, enabling organisations to implement pre-emptive measures that block or flag suspicious prompts before they compromise system integrity. The speaker positioned this technique as essential for proactive security in AI deployments.

Mechanistic Interpretability: Examining Internal Processes

Moving beyond input-output analysis, mechanistic interpretability examines the internal mechanisms of AI models to understand how features activate in response to prompts and identify behavioural patterns. This deeper level of analysis provides insights into the fundamental operations of AI systems, enabling more robust security measures and more reliable performance predictions.

Addressing Generative AI Challenges: Concept Unlearning

The session also tackled one of generative AI's most pressing challenges: mitigating biases and harmful outputs inherited from training data. The speaker outlined an innovative approach called concept unlearning, which removes harmful associations from text-to-image diffusion models whilst preserving their generative capabilities.

Traditional unlearning approaches were criticised as shallow and easily circumvented—a significant vulnerability in systems deployed at scale. To address this limitation, the speaker's laboratory developed a sophisticated technique that rearranges related concepts within models, achieving robust unlearning without degrading overall performance. This breakthrough represents a significant advancement in making generative AI systems safer and more aligned with ethical standards.

Conclusion: XAI as the Foundation of Trustworthy AI

The presentation concluded with a compelling argument: explainable AI must become a foundational element of trustworthy AI systems, not an optional enhancement. As organisations increasingly rely on AI for critical decisions, the ability to understand, audit, and validate these systems becomes paramount.

The speaker's research demonstrates that XAI delivers tangible benefits across three critical dimensions:

  • Enhanced Trust: By making AI reasoning transparent, organisations can build confidence amongst users, regulators, and stakeholders.
  • Improved Decision-Making: Understanding AI rationale enables human experts to validate recommendations and identify potential errors.
  • Stronger Security: Exposing how AI systems process inputs reveals vulnerabilities that adversaries might exploit, enabling proactive defence measures.

In 2026, as Gartner forecasts indicate that explainable AI will drive LLM observability investments to 50% of GenAI deployments by 2028, the message is clear: organisations that prioritise transparency and explainability will gain competitive advantages in trust, security, and regulatory compliance. The path forward requires integrating XAI principles from the earliest stages of AI development, ensuring that the systems shaping our future remain accountable, secure, and aligned with human values.


Key Takeaways

Explainable AI Is Essential for Trust and Safety

Explainable AI Is Essential for Trust and Safety

The session highlighted that opaque AI systems pose significant challenges in high-stakes sectors such as energy, finance, and healthcare. Explainable AI (XAI) techniques, such as feature attribution, counterfactual reasoning, and mechanistic interpretability, are crucial to building trust, ensuring accountability, and reducing the risk of harmful or adversarial outcomes.

Adversarial Attacks Exploit AI's Opacity

Adversarial Attacks Exploit AI's Opacity

The speaker underscored how a lack of transparency in AI systems creates vulnerabilities that adversaries can exploit. Techniques like prompt injection attacks demonstrate the need for robust explainability frameworks to identify and mitigate potentially harmful inputs before they lead to undesirable outcomes.

The speaker underscored how a lack of transparency in AI systems creates vulnerabilities that adversaries can exploit. Techniques like prompt injection attacks demonstrate the need for robust explainability frameworks to identify and mitigate potentially harmful inputs before they lead to undesirable outcomes.

Organisations Often Misunderstand AI Agent Use Cases

Mechanistic interpretability, which involves analysing internal model features and responses, was presented as a powerful tool to uncover biases and detect suspicious behaviour. This approach can help prevent the misuse of AI systems, such as the recovery of harmful concepts in generative AI models or the extraction of sensitive data from chatbots.

Thank You to Our 2026 Sponsors & Partners