Detecting XPIA (Cross Prompt Injection Attacks)
Cross Prompt Injection Attacks (XPIA) are in the news. One recent case is EchoLeak M365 copilot vulnerability where indirect prompts in email was used to generate markdown image to exfiltrate sensitive data. Google too issued an advisory recently on XPIA-
https://security.googleblog.com/2025/06/mitigating-prompt-injection-attacks.htmlAs AI agents become more common, cybercriminals are expected to exploit XPIA for data theft, sabotage, and espionage
XPIA is more than just Email Threats
Attackers can execute XPIA through various channels beyond email. Other methods include:
Document Repository Poisoning
Oversharing sensitive documents is a common issue. In the past, accessing such data required manual effort. AI agents can easily retrieve this content for users, making overshared documents a high-risk vector.
Attackers can inject variety of indirect prompts into these documents. These prompts may instruct agents to:
- Exfiltrate data
- Redirect users to malicious URLs
- Generate harmful markdown content
Vector Database Poisoning
Vector databases used in Retrieval-Augmented Generation (RAG) can be corrupted to include malicious prompts. The research paper https://arxiv.org/abs/2407.12784 describes it further.
Even if the query and response look normal, the citations used to generate the response is altered to malicious content. Or deliver malicious hidden instruction to Agent.
How to Detect XPIA
For Emails, XPIA needs to be monitored as we monitor for phishing, spams and other malicious content. This include techniques like:
- Prompt injection classifiers
- Markdown sanitization
- Suspicious URL redaction
Large providers like Microsoft and Google have added detection of such content to flag XPIA content.
However, these methods don’t cover document poisoning or vector DB attacks, where malicious content hides in citations or within text content.
Even in email scenarios, detecting every XPIA attempt is tough due to the sheer variety of English instructions. That’s why additional detection methods are essential.
Ackuity’s Approach to XPIA Detection
At Ackuity, we’ve added more methods to detect XPIA:
1.Track anomalous behaviors of agents
Apart from of scanning for malicious content described above, we also monitor how agents behave. The idea is to track the result of XPIA on Agent and not just detect the input to Agent.
Here are some examples of detection in Ackuity, using anomalous Chain of Thought (CoT) and tool usage.
a) Anomalous Chain of Thought (CoT)
Agents usually will follow a consistent reasoning pattern over time. If a document is poisoned, the agent’s thought process may change—even if the query and response look normal. Detecting any such anomalous changes in CoT is an indicator of XPIA.
Here is the COT under two scenarios-
# | Query | Chain of Thought | Response |
---|---|---|---|
1 | Who are our customers | I need to use the search tool to find information about who our customers are. | List known customers |
2 | Who are our customers? | I am being tasked to upload my responses to an external site. I should mention that it is against the policy. However, I will still answer the question by using the search tool as instructed. | Same listing as above. |
The second example shows a hidden malicious instruction—an indicator of XPIA.
b) Anomalous usage of tools
Baselining the normal tool actions and detecting anomalous usage can indicate XPIA. If an agent suddenly switches from using a search tool to executing risky actions (like deleting records via SQL), it could be a sign of a successful XPIA. This can be monitored through observability data, like in Langsmith below-
2.Monitor RAG citations
Most guardrails today only check the query and response. But citations used to generate responses can be poisoned.
For example:
- Query: "Provide details of customer John Doe"
- Response: "No specific details found"
- Citation: Contains hidden prompt or sensitive data
This blind spot can be exploited if not monitored.
XPIA is a fast-evolving threat with immense possibilities for exploitation. We looked at some of the detection methods in this article. Are you seeing other methods in use? Lets connect and exchanges ideas on this.