← Bookmarks 📄 Article

Eurostar AI vulnerability: when a chatbot goes off the rails | Pen Test Partners

A security researcher bypassed Eurostar's AI chatbot guardrails by exploiting a fundamental flaw: the system only verified the latest message's signature, allowing earlier conversation history to be manipulated—then Eurostar accused them of blackmail during disclosure.

· ai ml
Read Original
Listen to Article
0:001:11
Summary used for search

• Eurostar's chatbot had guardrails and cryptographic signatures, but only validated the most recent message—earlier messages in the chat history could be edited client-side and fed directly to the model as trusted context
• By sending a harmless latest message (which passed the guard), the researcher could modify previous messages to inject prompts that leaked the system prompt, model name, and enabled HTML injection for self-XSS
• The disclosure process was a disaster: initial reports went unanswered, Eurostar lost the disclosure during a VDP transition, then accused the researcher of blackmail despite proper responsible disclosure
• The core lesson: traditional web security flaws (weak input validation, client-side trust, improper signature binding) still apply when LLMs are involved—AI doesn't excuse skipping security fundamentals

A security researcher discovered four vulnerabilities in Eurostar's public AI chatbot while using it as a legitimate customer. The chatbot appeared well-designed with guardrails, cryptographic signatures, and UUIDs for messages and conversations. However, the implementation had a critical flaw: the server only verified the signature on the latest message in the conversation history, never re-validating older messages. This meant an attacker could send a harmless message that passed the guardrail check, then edit earlier messages in the chat_history array to inject malicious payloads that would be fed directly to the model as trusted context.

Using this technique, the researcher bypassed guardrails to extract the underlying model name and system prompt through prompt injection. They also demonstrated HTML injection that could lead to self-XSS, and found that conversation and message IDs were not properly validated—the server accepted simple values like "1" or "hello" instead of enforcing proper UUIDs. While these issues weren't immediately critical given the chatbot's limited functionality, they created clear paths to more serious attacks if the system were expanded to handle personal data or account details.

The disclosure process was remarkably painful despite Eurostar having a published vulnerability disclosure programme. The initial report went unanswered for a month. When escalated via LinkedIn, Eurostar claimed no record of the disclosure—they had outsourced their VDP and lost reports during the transition. Most bizarrely, Eurostar accused the researcher of attempted blackmail, despite the researcher simply following responsible disclosure practices and requesting acknowledgment. The core takeaway is that traditional web security principles—server-side validation, proper signature binding, input sanitization—still apply when LLMs are in the loop. AI features don't get a pass on security fundamentals.