Multi-Agent AI Systems: The Attack Surface Nobody's Talking About

AI Agents Are Talking to Each Other — And Attackers Are Listening

Multi-agent AI systems are no longer experimental. Businesses across financial services, legal, healthcare, and technology are deploying AI architectures where individual agents specialise in discrete tasks — one handles data retrieval, another interprets results, a third takes action. Platforms like Amazon Bedrock make it straightforward to build and orchestrate these systems at scale. The problem is that security thinking hasn't kept pace. Most organisations assess AI risk at the model level: is the model safe? Is the data it was trained on clean? Those are reasonable questions, but they miss the real threat. In a multi-agent system, the attack surface is the communication between agents, and that surface is largely unguarded. Research published by Palo Alto Networks' Unit 42 in 2025 examined multi-agent architectures on Amazon Bedrock specifically, and the findings should give any CISO pause. Attackers don't need to compromise the model itself. They need to inject malicious instructions into the data or text that an agent processes — and those instructions can then propagate across an entire agent network.

What Is Prompt Injection, and Why Does It Scale So Badly in Multi-Agent Systems?

Prompt injection is an attack technique where malicious text — hidden in a document, a web page, a database record, or any other content an AI agent reads — instructs the agent to behave in ways its developers never intended. Think of it as social engineering, but directed at software instead of people. In a single-agent system, the blast radius of a successful prompt injection is relatively contained. The agent reads the malicious instruction, acts on it, and the damage is limited to whatever that one agent could do. In a multi-agent system, the dynamic changes entirely. A supervisor agent receives output from a compromised sub-agent and treats it as trusted input. That output may contain injected instructions designed to manipulate the supervisor's behaviour. The supervisor then passes instructions to further agents downstream. One successful injection can cascade across an entire automated workflow before any human reviews a single output. Unit 42's research specifically identified this trust propagation problem as a defining characteristic of multi-agent risk. When Agent A trusts Agent B and Agent B trusts Agent C, a compromise at the entry point of the chain can reach every node in it. Traditional access controls — which assume human actors — don't map cleanly onto agent-to-agent communication.

How Attackers Exploit Multi-Agent Architectures in Practice

The attack patterns that emerge from multi-agent systems aren't theoretical. They follow a consistent logic that security teams can model and defend against — but only if they understand the mechanics first. The most common vector is indirect prompt injection via external data. An agent tasked with summarising documents, browsing the web, or querying a database will process content it retrieves from the environment. If an attacker can influence that content — through a poisoned webpage, a manipulated document in a shared repository, or a crafted database record — they can deliver instructions to the agent without touching the model or the platform directly. From there, the injected instructions typically attempt one or more of the following:

Privilege escalation: instructing an agent to request permissions or access resources beyond its intended scope
Data exfiltration: directing an agent to extract sensitive information and route it to an attacker-controlled endpoint
Action hijacking: overriding the agent's intended task to perform a different action — deleting records, triggering transactions, or sending communications
Lateral movement: crafting output that a downstream agent will process, extending the attack further through the workflow

Why Amazon Bedrock Specifically Raises the Stakes

Amazon Bedrock is one of the most widely adopted platforms for building production AI applications. It offers access to foundation models from multiple providers — Anthropic, Meta, Mistral, Amazon's own Nova models — alongside infrastructure for agent orchestration, memory, and tool use. Its accessibility is precisely what makes it a target-rich environment. Unit 42's research found that Bedrock's multi-agent architecture, specifically its support for supervisor agents that coordinate networks of sub-agents, creates conditions where trust assumptions are baked into the design. A supervisor agent receiving a response from a sub-agent has no native mechanism to verify that the response hasn't been manipulated in transit or at the source. This isn't a flaw unique to Amazon. It reflects a broader immaturity in how the AI industry has approached inter-agent security. The OWASP Top 10 for Large Language Model Applications, first published in 2023 and updated in 2025, lists prompt injection as the number one risk. Yet most deployment guides for agentic AI frameworks focus on capability configuration, not security hardening. For businesses using Bedrock to build customer-facing tools, internal automation, or data processing pipelines, the implication is straightforward: the AI system you deployed last quarter may be processing attacker-controlled content right now, and you likely have no visibility into whether injected instructions have influenced its behaviour.

What Security Controls Actually Apply Here?

The security community is still developing a mature framework for agentic AI threats, but several controls are immediately applicable and should be on every team's checklist. First, apply least-privilege principles to agent permissions with the same rigour you'd apply to human user accounts. If an agent's job is to read documents and produce summaries, it should have no write access, no ability to make API calls to external services, and no access to data stores outside its defined scope. Many Bedrock deployments grant agents broad permissions for convenience during development and never tighten them before production. Second, treat agent-to-agent communication as untrusted by default. Output from one agent should be validated before it's passed to another, particularly if that output includes instructions or directives rather than pure data. This is uncomfortable because it partially defeats the purpose of automation, but it's the correct security posture. Third, log everything. Multi-agent systems that operate without comprehensive audit trails are essentially black boxes. You cannot detect prompt injection after the fact if you have no record of what each agent received, what it was instructed to do, and what actions it took. Amazon Bedrock supports integration with CloudTrail and CloudWatch, but that logging must be explicitly configured and actively monitored. Fourth, map your AI attack surface before attackers do. Every external data source an agent can read, every API it can call, every document repository it can access represents a potential injection point. That surface needs to be inventoried, assessed, and continuously monitored for changes.

The Bigger Pattern: AI Is Expanding Your Attack Surface Faster Than You Can Track It

Multi-agent AI systems are one specific manifestation of a broader problem. Every AI tool your organisation adopts — whether you build it or buy it — extends your attack surface in ways that don't show up in traditional vulnerability scans or penetration tests. A SaaS application that uses an AI agent to process incoming emails now has an email-shaped injection vector. A customer service platform that uses an LLM to summarise support tickets is passing customer-submitted text directly into an AI decision-making layer. A finance tool that uses AI to process invoices is reading attacker-controlled documents. None of these vectors would appear in a conventional scan for unpatched software or misconfigured firewalls. They require a different kind of visibility — continuous, contextual, and focused on how your systems interact with the outside world. This is the gap that attack surface management tools are designed to address. Platforms like Hadrian, which continuously maps your exposed assets and tests them against real-world attack techniques, can help organisations understand which of their AI-connected systems are reachable from the internet and what an attacker could do if they reached them. As AI adoption accelerates, that kind of external-facing visibility stops being optional.

How to Protect Your Business Against AI Agent Attacks

If your organisation uses any AI tool that processes external content — documents, emails, web data, customer inputs — you are already exposed to some variation of the risks described here. The question is whether you have the visibility to know it. Hadrian's continuous attack surface management platform can help you identify which AI-connected systems are externally reachable and model the attack paths an adversary could take to reach them. Rather than waiting for a scheduled penetration test, Hadrian runs continuously, surfacing new exposure as your environment changes. For organisations building or deploying agentic AI tools, that continuous posture is more appropriate than point-in-time assessments. For broader threat detection across your environment, Sophos MDR provides 24/7 monitoring and response capability that doesn't rely on AI agents to catch threats in your AI agents. Human analysts reviewing telemetry from your endpoints, network, and cloud infrastructure can identify anomalous behaviour — unusual data transfers, unexpected API calls, out-of-pattern system actions — that automated systems trained on normal behaviour might miss. If your concern is specifically about data leaving your environment through compromised AI workflows, BlackFog's anti data exfiltration capability monitors and blocks unauthorised outbound data transfers at the device level, regardless of the mechanism — including AI agents making unexpected API calls to external endpoints. You can assess your current data exfiltration exposure in two minutes at /data-exfiltration-risk. The organisations that get this right won't be the ones that avoid AI. They'll be the ones that apply the same security discipline to AI systems that they apply to every other part of their infrastructure. If you want to understand where your current exposure sits, talk to the Kyanite Blue team at /contact for a no-obligation assessment of your security posture.

Frequently Asked Questions

What is prompt injection in AI systems and why is it dangerous?

Prompt injection is an attack where malicious text embedded in content processed by an AI agent — such as a document, web page, or database record — overrides the agent's intended instructions. In multi-agent systems, a single successful injection can cascade across an entire automated workflow, causing data theft, unauthorised actions, or system manipulation before any human intervenes.

How does Amazon Bedrock's multi-agent system create security risks?

Amazon Bedrock allows businesses to build networks of AI agents where a supervisor agent coordinates multiple sub-agents. Because these agents pass outputs to one another as trusted inputs, an attacker who injects malicious instructions into any data source an agent reads can influence the entire chain. Bedrock has no native mechanism to verify that agent-to-agent communication hasn't been tampered with.

What security controls should organisations apply to multi-agent AI systems?

Organisations should apply least-privilege permissions to every agent, treating agent-to-agent outputs as untrusted by default. Comprehensive logging of all agent inputs, instructions, and actions is essential for post-incident analysis. Continuous attack surface mapping helps identify which AI-connected systems are externally reachable and therefore vulnerable to indirect prompt injection via attacker-controlled content.

AI securityprompt injectionAmazon Bedrockmulti-agent AIattack surface management