
🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4 · Curated by Tom Hundley
If one AI tool in your workflow can read untrusted input, write code, or trigger downstream automation, it can become a stepping stone for a broader compromise. That's the core idea behind an agent-to-agent attack: an attacker influences one system, and trust between connected tools helps the malicious change spread.
In practice, that might mean a coding assistant acts on a poisoned GitHub issue, a chat-connected bot forwards bad instructions, or an automated pipeline treats AI-generated changes as trustworthy because they came from an internal tool. The exact path varies, but the risk is the same: too much automation, too much trust, and not enough human review.
If you're building with tools like Cursor, Replit, v0, or Bolt, the safest assumption is simple: any tool with broad permissions and access to external content needs tighter controls. The good news is that you do not need to stop using AI tools. You need to limit what they can access, require human approval for sensitive actions, and treat AI-to-AI handoffs as security boundaries.
TL;DR: Attackers usually do not need to break your AI tools directly. They can influence one tool with untrusted input, then let automation and trust relationships carry the damage forward.
Think of your AI tools like coworkers with overlapping responsibilities. One tool reads messages or tickets. Another writes code. Another runs tests or opens pull requests. A deployment system may publish whatever passes the pipeline. If each step assumes the previous one is trustworthy, a bad instruction can move surprisingly far.
A typical chain looks like this:
This pattern overlaps with well-documented AI security risks, including prompt injection, excessive permissions, and over-trusting tool output. OWASP's guidance for LLM applications has consistently warned about excessive agency and insecure tool use as major risk areas. The specific label "agent-to-agent attack" is still emerging and not yet a formal standard category, but the underlying mechanics are real and align with known attack classes.
TL;DR: The biggest risk is not that AI tools exist. It's that many builders accept default permissions, integrations, and automation without checking what those tools can actually do.
If you started with an AI-first workflow, you may have connected tools quickly so you could move faster. That convenience can create a risky setup, especially when one tool can read external content and another can change code or trigger deployment.
Common high-risk capabilities include:
Here's a practical way to think about the risk:
| Agent Permission Level | What It Can Do | Risk If Exploited |
|---|---|---|
| Read-only on current file | View a limited part of the codebase | Lower risk, but can still expose sensitive code or context |
| Read/write on project | Edit files across the repository | High risk: can inject malicious logic or weaken controls |
| Package installation or command execution | Add dependencies or run scripts | Critical risk: can introduce malware or alter the environment |
| External communication or content ingestion | Read issues, messages, or third-party content | Critical risk: can be manipulated by prompt injection or spoofed instructions |
| Deployment or merge access | Push changes toward production | Severe risk: can turn a bad suggestion into a live incident |
If you have not reviewed these permissions manually, assume they are broader than necessary. That's especially important for solo builders and small teams moving quickly. For a broader look at permission scoping and review habits, see AI Agent Security for Vibe Coders.
TL;DR: Most real-world examples look ordinary at first: a bug report, a patch suggestion, a support message, or a bot action that seems routine.
Here are three patterns worth watching.
An attacker submits a GitHub issue, support request, or task description containing hidden instructions or misleading remediation steps. A coding agent reads the text as part of its context and makes changes beyond the stated task.
How to spot it: If the AI modifies files unrelated to the request, adds network calls, changes authentication logic, or introduces new dependencies without a clear reason, stop and review manually. We covered this broader pattern in Hidden Prompts in GitHub Issues.
An attacker shares a supposed patch for a vulnerability and frames it as urgent. The message may reference a real package or a plausible CVE format, but the code itself introduces a backdoor, weakens validation, or exfiltrates data.
How to spot it: Verify security fixes against the official maintainer repository, vendor advisory, or a trusted vulnerability database. Do not trust a patch just because it sounds urgent or cites a CVE-like identifier.
An attacker creates a bot, action, or integration with a name that resembles a legitimate tool. Humans and AI systems alike may treat it as trustworthy if the name looks familiar.
How to spot it: Check the actual publisher, installation source, permissions requested, and account identity. Similar names are not proof of legitimacy.
These patterns also overlap with software supply chain risk. If an AI tool can install packages or follow dependency suggestions automatically, the blast radius gets larger. For a related example, see the ClawHavoc supply chain attack.
TL;DR: Reduce permissions, isolate external inputs, and require human approval before code is merged or deployed.
Here is the practical checklist.
Open every AI tool you use and document what it can read, write, execute, install, and connect to. Include repository scope, shell access, package management, issue access, chat integrations, and deployment permissions.
Do not let a coding agent act directly on Slack messages, Discord posts, email, public issues, or scraped web content without human review. External content is untrusted input.
Auto-deploy is not always wrong, but auto-deploying AI-generated changes without review is a bad idea. At minimum, require human approval before merging security-sensitive changes or promoting code to production.
Before accepting changes, inspect:
A self-audit prompt is not a security control by itself, but it can help surface surprises.
Prompt to paste into your AI tool:
Review all the changes you just made to my project. List every file you modified, every dependency you added or changed, every command you ran, and every external URL or API endpoint referenced in the new code. Flag anything that affects authentication, authorization, payments, secrets, deployment, or data sent outside this application.
Use that summary as a starting point, not as proof that the changes are safe.
TL;DR: These attacks are less about one specific product and more about how much authority, connectivity, and trust you give your tools.
It is a chain attack where one AI-enabled system is influenced by malicious or untrusted input, and downstream tools amplify the damage because they trust the earlier step. The attack may involve coding assistants, bots, CI systems, ticketing tools, or deployment automation.
Not inherently. The risk depends on configuration: what the tool can access, whether it can act on external content, and whether a human reviews the output before merge or deploy. Broad permissions and unattended automation increase risk.
Look for changes you did not request, especially around authentication, secrets handling, network calls, CI configuration, dependencies, or deployment settings. Also review logs for unusual bot actions, unexpected pull requests, or changes triggered from external content sources.
Yes. Treat each handoff as a trust boundary. Limit permissions, isolate untrusted inputs, require approval for sensitive actions, and log what each tool did. Multiple tools are manageable if they are not allowed to silently approve one another.
For most teams, it is a mandatory human approval step before high-impact actions such as merging privileged changes, rotating secrets, changing infrastructure, or deploying to production. That breaks the automatic cascade.
AI tools can speed up development, but they also compress the time between bad input and real impact. When one tool can influence another, convenience becomes a security design problem.
The fix is not to abandon AI-assisted development. It is to build safer boundaries: narrower permissions, cleaner separation between external content and code changes, and explicit human approval for high-risk actions.
If your team is adopting AI-assisted workflows and wants help designing safer guardrails, Elegant Software Solutions can help you assess permissions, review automation paths, and build practical controls that do not kill velocity.
Share this with someone building with AI tools. They may not realize how much trust exists between the systems in their workflow.
Discover more content: