Expert Interview, June 2026
From Experimentation to Production
Building Reliable AI for Compliance
At The AI Summit London, we spoke with Pratima Upadhyay, Software Engineer at Air BnB, following her session, Reality | Operationalising LLMs in High-Stakes Domains: Building Reliable AI Systems.
In this interview, Pratima shares her career journey, practical lessons from deploying AI in production, and how her team balances reliability, latency and cost in complex compliance workflows.

Read the Full Interview
Interviewer: Hello and welcome to The AI Summit London. We are joined by Pratima Upadhyay, Software Engineer at Airbnb. You have an extensive background. Could you guide me through your journey and how you arrived at Airbnb?
Pratima: I completed my bachelor’s degree in 2019, majoring in Information Technology. Right after graduation I joined Microsoft, where I worked on the Azure Backup team. I spent a lot of time in cloud infrastructure and distributed systems, and I became fascinated by how distributed workflows operate and how data is retrieved from multiple sources.
After three years at Microsoft, I felt it was time to explore a different domain. I have always been passionate about travel, and I was a long‑time fan of Airbnb’s technical content, which I would read in my free time. As it happened, a recruiter from Airbnb reached out to me on LinkedIn about an open role and asked if I would be interested in interviewing. I went through a two‑month process with multiple coding and system design rounds, and I was delighted to receive an offer.
One of the big draws was that the role was in the Payments team. Payments is a consistently important area in technology, and I wanted exposure to fintech. That opportunity appealed to me, so I joined. It has now been around three and a half years. I work on the Payments Compliance team, and it has been fascinating to understand how payment workflows and compliance operate at an organisational level. I am looking forward to learning more and continuing to grow in this role.
Interviewer: In your session you discussed operationalising LLMs in high stakes domains. What are the key challenges you face in building reliable AI systems for compliance and risk management, and how have you addressed them?
Pratima: A major challenge with deploying AI systems to production is their probabilistic nature. Traditional software engineering is deterministic: we provide an input and we know exactly what output to expect. With LLMs we cannot predict a single correct output in the same way, so strong evaluation and observability are essential.
Evaluating AI systems requires robust testing frameworks and continuous monitoring. Observability also looks different from traditional systems because of the non‑deterministic behaviour. We need constant feedback loops and guardrails to ensure results are accurate and to reduce the risk of hallucinations.
Latency is another key concern. When AI is embedded in production workflows used by analysts, we must keep response times low, so we do not slow down case work. That means being thoughtful about model choice, prompt design, caching and retrieval strategies, and ensuring the overall system remains responsive.
Interviewer: You touched on workflows. How can AI systems be designed to coordinate multiple data sources and tools effectively? What impact have you seen on productivity in complex workflows and case management environments?
Pratima: A common issue we see is not a lack of intelligence but fragmented information. Production environments are messy. Data is scattered across many platforms. In compliance investigations, an analyst may handle sanctions, anti‑money laundering or KYC cases. The challenge is not only knowing the next action in the workflow but also gathering the right information to support the decision.
Within Airbnb there are different platforms an analyst might need to consult. An effective AI system retrieves the necessary information from across those systems and presents it in one place within the case management workflow. For example, there may be extensive conversation history between a host and a previous analyst. When a new analyst takes over, the first task is to understand the current stage of the investigation. Instead of manually sifting through historical exchanges, the AI can recognise that a new analyst has claimed the case, identify the relevant history and provide a clear summarisation of prior conversations, case notes and documents. This significantly reduces case handling time by speeding up context gathering and helping the analyst understand exactly where the investigation stands. We apply similar summarisation techniques across case notes, case histories and uploaded documents, which further reduces average handling time.
Interviewer: What challenges do you face when balancing cost, reliability and latency in your AI architecture, and how are you approaching them?
Pratima: Every new orchestration layer we add can increase the time taken to retrieve information, so we need to be careful. When we introduce a new layer of orchestration or reasoning, we evaluate whether it genuinely reduces latency or whether it increases the overall average handling time (AHT) of a case.
It is not always necessary to use the most powerful model. A simpler model can be sufficient for tasks such as summarisation if it is reliable and consistent. Speed matters, but it is one factor among several. We consciously trade-off between speed, consistency and reliability, and we choose the smallest, fastest option that still meets quality thresholds for the specific task. That approach helps us control cost while maintaining responsiveness and trustworthiness.
Interviewer: Many organisations are struggling to move from experimentation to reliable production use for AI systems. Could you share some strategies or frameworks you have found most effective?
Pratima: I have observed that many AI initiatives succeed in experimentation but fail in real-world production. That often happens because production systems introduce operational constraints such as latency, compliance, trust, risk, and safety. We need to consider not just model accuracy but the entire system. System design, system-level debugging, and treating the solution holistically are essential. Rather than focusing only on the model or treating the AI as a standalone experiment, we should embed it within business workflows and treat it as a software product and system design effort, not an experiment.
Interviewer: You must have learned many lessons from working in high stakes domains. Based on your experience deploying AI in critical areas such as financial crime and compliance, what are the most important lessons around trust and sustainable performance in enterprise platforms?
Pratima: One important point that can be overlooked is human in the loop systems. Many organisations aim for AI first systems, which can imply removing human decision-making. That is not how most real-world systems should function, especially in industries like payments and compliance that must be auditable and require explainable decisions. We cannot eliminate humans. Most production systems should keep humans in the loop. The best step is to create embedded workflows that help end users reach decisions faster. Rather than acting as a black box that makes decisions on their behalf, the AI should act as an accelerator that supports agents or other end users in arriving at decisions more quickly. I also read a survey indicating users are more likely to adopt new AI systems when they know the system will not replace them but will act as a companion that helps them reach results faster.
Interviewer: You are now a mentor and a Women in Tech ambassador, which is fantastic. As you support the next generation, what advice would you give to aspiring engineers who want to build long, fulfilling careers like yours?
Pratima: First, never underestimate the value of depth. Our industry moves quickly, and there is constant pressure to chase the latest technologies, but critical thinking, system design, algorithms, data structures, and debugging form the foundation of engineering depth. Do not neglect the fundamentals; they will carry you far. Second, especially for women, I encourage you to take up space with confidence. I have observed that many women hold back because of lower representation. Be more visible, contribute, speak publicly, and write. Take up space confidently.
Closing
Thank you to Pratima Upadhyay for sharing practical insights from deploying AI in production for compliance and risk management.
Her experience underscores that reliable systems come from strong evaluation, thoughtful orchestration and a relentless focus on reducing information fragmentation for analysts.
We look forward to seeing how these approaches continue to evolve following her session at The AI Summit London.















