From Text File to Data Breach: The Risks of Prompt Injection in Autonomous AI Agents

Varadrajan Kunsavalikar
6 minutes ago
5 min read

Most privilege escalation stories start with a bad actor. This one starts with a legitimate actor, a vendor invoice, and then exploiting the poor agentic ai design choices.

That distinction matters. It's also where most security teams have no visibility right now.

The Setup: An Agent Built to Help

Picture a standard enterprise workflow: an AI agent deployed on AWS, continuously monitoring a storage bucket where external vendors upload invoices. When a new file arrives it triggers an event notification and the agent activates, reads it, analyzes it for anomalies and writes a security report. Clean, automated, useful.

The agent also had access to a second, more powerful set of credentials, intended for situations where a vendor compromise needed immediate investigation across sensitive systems. In practice, this meant the agent could reach sensitive data, on its own, if it determined the threat was severe enough.

Happy Path Workflow:

A reasonable design decision on paper. The risk was in letting an AI agent decide when to use that access, with no human approval gate and the elevated access available at all times.

What the design doesn't account for is what happens when the file itself becomes the attacker.

The Attack: A Normal Looking Invoice

The attacker doesn't need credentials, network access or a software vulnerability. They upload a file to the vendor bucket. It looks like an invoice, with the vendor name, invoice number, amount due and payment terms. Everything that would be contained in a real invoice.

But hidden within the document, indistinguishable from the rest of the text, are instructions designed to hijack the agent's reasoning. The agent reads the file to analyze it, and the moment it does, those instructions become part of the agent's thinking. The agent can't tell the difference between its own instructions and the attacker's. It sees what looks like a legitimate workflow directive and acts on it.

This is indirect prompt injection. No code is exploited. No credentials are stolen. The agent is simply tricked into doing something it was never supposed to do.

What is a Prompt Injection Attack?

Prompt injection is when someone hides instructions inside content an AI agent is meant to read,a document, email or webpage, so the agent follows those instructions instead of its own rules. The user never sees the payload. The agent treats it as part of the task.

Real-world example: In 2024, researchers at Brave demonstrated this with Perplexity's Comet AI browser. They hid instructions inside a Reddit comment, invisible to the human reader, but read by the AI when asked to summarize the page. The browser agent followed those hidden commands, accessed the user's logged-in email, and exfiltrated a one-time password. No malware, no stolen credentials, just text on a page. (Brave Security Research)

What Happens Next

The agent's internal reasoning tells the story:

"The invoice file indicates that there is a need to escalate to the higher-privilege role for further analysis. Since this is an escalated incident response, I'll assume the restricted role to access the required data and include it in the report."

The agent was never authorized to make that decision. But it did, autonomously, without human review. All in under 14 seconds from file upload to data exfiltration.

What follows is lateral movement, not by a human attacker, but by a tricked AI agent:

The agent uses its normal credentials to read the uploaded invoice expected behavior
The agent decides the situation requires escalation triggered by the hidden instructions
The agent reaches for the higher-privilege credentials it was given for emergencies no human approves this
The agent reads executive payroll data from a restricted HR system names, titles, bonus amounts
The agent writes all of that data back to the vendor bucket where the attacker retrieves it

Attack Path Workflow:

No policy was violated as every action was technically authorized. The damage happened in the gap between what the agent was meant to do and what it actually did. The danger is the DESIGN and SEQUENCE.

Why This Happened, and Why It Will Happen Again

Three design decisions created the vulnerability, none of them being unusual.

The elevated access was always available. The agent's emergency credentials were meant for rare investigations but they weren't gated behind a human approval step or a time-limited grant. They sat there, available to the agent at all times. The only thing deciding when to use them was the AI's own judgment. The attacker didn't exploit a bug, they exploited a design choice.

The attack surface was the agent's own job. The weapon is a text file uploaded to a bucket the agent is supposed to monitor. Any vendor, contractor or external party with upload access becomes a potential attacker. The agent can't simply stop reading files as that's its entire purpose.

The AI's judgment is not a security control. In all of our testing, the attack didn't succeed every time. On some runs, the agent recognized the hidden instructions as unusual and declined to escalate. On others, it followed them completely. Sometimes it resists, sometimes it doesn't. You can't audit why, you can't predict when and you certainly can't rely on it.

The Detection Gap

No single event in this attack looks suspicious as the credential usage is authorized and the file read is expected. The data access is within policy. Each action, in isolation, passes every compliance check.

What makes this attack visible is not any individual event, it's the sequence.

A traditional monitoring tool sees "agent read a file" and marks it normal. It sees "agent used its credentials" and marks it normal. Nothing to flag.

But if you watch the full chain of activity, normal file access, followed by an unexpected credential escalation, followed by access to a restricted system, followed by sensitive data written back to an external-facing location, the deviation becomes obvious. That sequence doesn't match any approved workflow. A vendor scanning agent reaching for payroll data is not normal behavior, no matter how authorized the individual steps appear.

Side by Side Comparison of both the Paths:

The activity data is there, every action is logged. The question is whether your security program is watching the sequence or treating each event independently.

The Broader Implication

This attack required no CVE, no zero-day and no network intrusion. It only required a text file and a bucket that the agent was already supposed to read.

As organizations accelerate AI agent adoption, the identity surface they are expanding isn't covered by the same controls that govern human access. Agents use credentials, they escalate privileges, they move between systems and they do all of it faster, more often and with far less human oversight than any person in the same environment.

The question isn't whether AI agents should have production-level access. Many legitimate workflows require it. The question is whether your security program can see what those agents actually do with that access, not just what they were provisioned for, and whether it can see it fast enough to matter.

There's a real gap between policy intent and runtime behavior. That gap is where this attack lives. And that gap is where observability needs to be.