
🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4 · Curated by Tom Hundley
OAuth 2.1 for MCP is only half the job. The harder part is managing tokens safely once agents start spawning, retrying, delegating work, and failing under load. In production, the problems are predictable: too many agents refreshing at once, overly broad tokens passed between services, brittle retry logic when the identity provider slows down, and poor audit trails when a sub-agent touches a sensitive MCP server.
This article focuses on those runtime concerns. It covers practical patterns for token acquisition, rotation, delegation, revocation, and proof-of-possession in MCP-based agent systems. If you already understand the protocol basics, this is the next layer: how to keep token handling reliable, scoped, and observable when the clients are autonomous agents rather than humans clicking through a browser flow. For protocol-level background, see our MCP 2.0 enterprise implementation guide.
TL;DR: Human OAuth flows assume interactive sessions; agent OAuth requires automated acquisition, proactive rotation, and tightly scoped delegation.
The OAuth 2.1 framework commonly used with MCP deployments centers on a few patterns relevant to agent systems: client_credentials for machine-to-machine access, authorization_code with PKCE for user-delegated access, and token exchange as defined in RFC 8693 for delegation between services or agents where the identity provider supports it.
The challenge is operational, not conceptual. A single user request can trigger an orchestrator agent that fans out to multiple tool-calling agents, each connecting to different MCP servers and each needing a token with the right audience and scope. That means your token layer has to handle concurrency, short lifetimes, retries, and downscoping without assuming a human is present to fix things.
A sound default is simple: prefer short-lived, narrowly scoped tokens over long-lived broad ones. A leaked token with a five-minute lifetime and read-only scope is a contained problem. A long-lived token with broad administrative access is an incident.
| Agent Pattern | Grant Type | Typical Token Lifetime | Refresh Strategy |
|---|---|---|---|
| Background service agent | client_credentials |
Often 5-60 minutes | Re-acquire on expiry or shortly before |
| User-delegated agent | authorization_code + PKCE |
IdP-defined | Rotate according to provider policy |
| Sub-agent in orchestration chain | Token exchange (RFC 8693) | Often very short-lived | Re-delegate from parent when needed |
| Long-running pipeline agent | client_credentials |
Often 5-60 minutes | Background rotation with jitter |
TL;DR: Centralize token acquisition and caching in one token manager that handles rotation, jitter, and concurrent-request deduplication.
Don't scatter OAuth logic across your agent codebase. Build a token manager that every agent process calls. The exact implementation will vary by language and identity provider, but the design principles are consistent.
import asyncio
import time
import hashlib
from dataclasses import dataclass, field
from typing import Optional
import httpx
@dataclass
class TokenEntry:
access_token: str
expires_at: float
scopes: frozenset[str]
refresh_token: Optional[str] = None
issued_at: float = field(default_factory=time.time)
_lock: asyncio.Lock = field(default_factory=asyncio.Lock, repr=False)
@property
def ttl(self) -> float:
return max(0, self.expires_at - self.issued_at)
@property
def needs_rotation(self) -> bool:
"""Rotate at roughly 75% of TTL with deterministic jitter."""
if self.ttl <= 0:
return True
jitter = int(hashlib.sha256(self.access_token.encode()).hexdigest()[:8], 16) % 30
rotate_at = self.issued_at + (self.ttl * 0.75) - jitter
return time.time() >= rotate_at
@property
def is_expired(self) -> bool:
return time.time() >= self.expires_at
class MCPTokenManager:
def __init__(self, token_endpoint: str, client_id: str, client_secret_ref: str):
self._token_endpoint = token_endpoint
self._client_id = client_id
# Never hardcode secrets; resolve them from your secrets manager.
self._client_secret_ref = client_secret_ref
self._cache: dict[str, TokenEntry] = {}
self._pending: dict[str, asyncio.Task] = {}
def _cache_key(self, scopes: frozenset[str], audience: str) -> str:
return f"{audience}:{':'.join(sorted(scopes))}"
async def get_token(self, scopes: set[str], audience: str, secret_client=None) -> str:
key = self._cache_key(frozenset(scopes), audience)
entry = self._cache.get(key)
if entry and not entry.needs_rotation and not entry.is_expired:
return entry.access_token
if key in self._pending:
await self._pending[key]
return self._cache[key].access_token
task = asyncio.create_task(self._acquire_token(scopes, audience, secret_client))
self._pending[key] = task
try:
entry = await task
self._cache[key] = entry
return entry.access_token
finally:
self._pending.pop(key, None)
async def _acquire_token(self, scopes: set[str], audience: str, secret_client=None) -> TokenEntry:
client_secret = await self._resolve_secret(self._client_secret_ref, secret_client)
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
self._token_endpoint,
data={
"grant_type": "client_credentials",
"client_id": self._client_id,
"client_secret": client_secret,
"scope": " ".join(sorted(scopes)),
"audience": audience,
},
)
resp.raise_for_status()
data = resp.json()
return TokenEntry(
access_token=data["access_token"],
expires_at=time.time() + data["expires_in"],
scopes=frozenset(scopes),
refresh_token=data.get("refresh_token"),
)
async def _resolve_secret(self, ref: str, secret_client=None) -> str:
if secret_client:
return await secret_client.read_secret(ref)
raise ValueError("No secret resolver configured")Three design decisions matter here:
TL;DR: Use token exchange where supported to mint downscoped, short-lived tokens for sub-agents instead of passing parent tokens downstream.
In a typical orchestration pattern, an orchestrator agent receives a user-delegated or service token and dispatches work to specialized sub-agents. The insecure pattern is handing the same token to every sub-agent. The safer pattern is to exchange it for a narrower token with a shorter lifetime and a more specific audience.
async def delegate_token(
token_endpoint: str,
parent_token: str,
target_scopes: set[str],
target_audience: str,
actor_id: str,
) -> str:
"""
Exchange a parent token for a narrowly scoped sub-agent token
using RFC 8693 where the identity provider supports it.
"""
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
token_endpoint,
data={
"grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
"subject_token": parent_token,
"subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
"requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
"scope": " ".join(sorted(target_scopes)),
"audience": target_audience,
"actor_token": actor_id,
},
)
resp.raise_for_status()
return resp.json()["access_token"]The exact request shape depends on your provider. Some identity platforms support RFC 8693 directly; others expose adjacent patterns such as on-behalf-of flows. The important design rule is the same: sub-agents should receive their own token, scoped to their own task.
Delegation depth also matters. Keep the chain shallow, document it, and enforce policy in the identity layer or gateway. Deep delegation trees are hard to audit and easy to misconfigure. That aligns with the trust-boundary concerns discussed in our piece on rebuilding agent trust.
TL;DR: Map MCP scopes to your existing RBAC model and use native identity-provider features for delegation and policy enforcement.
Most enterprise teams will integrate MCP-facing services with Microsoft Entra ID, Okta, Auth0, or another standards-based identity provider. The details differ by vendor, but the implementation pattern is consistent:
mcp_scopes:
"mcp:tools:execute":
description: "Invoke tools on MCP servers"
enterprise_roles: ["ai-agent-operator", "platform-admin"]
"mcp:tools:read":
description: "List and describe available tools"
enterprise_roles: ["ai-agent-viewer", "ai-agent-operator", "platform-admin"]
"mcp:resources:read":
description: "Read MCP resources"
enterprise_roles: ["ai-agent-operator", "data-analyst", "platform-admin"]
"mcp:prompts:execute":
description: "Invoke prompt templates"
enterprise_roles: ["ai-agent-operator", "platform-admin"]
server_audiences:
"billing-mcp":
allowed_scopes: ["mcp:tools:execute", "mcp:resources:read"]
max_token_ttl: 300
"analytics-mcp":
allowed_scopes: ["mcp:resources:read"]
max_token_ttl: 900This keeps authorization understandable. Your identity provider remains the source of truth for authentication and coarse-grained authorization, while your MCP gateway or policy engine can enforce finer-grained rules such as tool-level restrictions, environment boundaries, or deny lists.
Avoid building a parallel authorization system unless you have a clear reason. Reusing enterprise RBAC reduces drift, simplifies audits, and makes incident response faster.
TL;DR: DPoP reduces replay risk by binding token use to a cryptographic key held by the calling agent.
Bearer tokens can be replayed by anyone who gets hold of them. DPoP, defined in RFC 9449, improves that model by requiring the client to prove possession of a private key when requesting or using a token. In agent systems, that can be valuable when tokens move across process boundaries or when you want stronger guarantees that a stolen token cannot be reused elsewhere.
import jwt
import uuid
import time
import hashlib
import base64
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import serialization
class DPoPManager:
def __init__(self):
self._private_key = ec.generate_private_key(ec.SECP256R1())
self._public_jwk = self._export_public_jwk()
def create_proof(self, http_method: str, target_uri: str, access_token: str | None = None) -> str:
headers = {"typ": "dpop+jwt", "alg": "ES256", "jwk": self._public_jwk}
payload = {
"jti": str(uuid.uuid4()),
"htm": http_method.upper(),
"htu": target_uri,
"iat": int(time.time()),
}
if access_token:
payload["ath"] = self._sha256_base64url(access_token)
return jwt.encode(payload, self._private_key, algorithm="ES256", headers=headers)
def _export_public_jwk(self) -> dict:
public_key = self._private_key.public_key()
public_numbers = public_key.public_numbers()
x = public_numbers.x.to_bytes(32, "big")
y = public_numbers.y.to_bytes(32, "big")
return {
"kty": "EC",
"crv": "P-256",
"x": base64.urlsafe_b64encode(x).rstrip(b"=").decode(),
"y": base64.urlsafe_b64encode(y).rstrip(b"=").decode(),
}
def _sha256_base64url(self, value: str) -> str:
digest = hashlib.sha256(value.encode()).digest()
return base64.urlsafe_b64encode(digest).rstrip(b"=").decode()DPoP is not universal across providers or SDKs, so treat it as a targeted hardening measure rather than a blanket assumption. Where supported, it is especially useful for high-value MCP integrations and delegated agent workflows.
Do not interrupt an in-flight call just to rotate a token. Rotate proactively before expiry so the next request uses a fresh token. If rotation fails temporarily, continue using the current token until it expires, then retry with backoff and reacquire.
Usually no. Per-instance registrations create operational overhead. A better default is one client registration per agent class or workload type, combined with short-lived tokens, strong audit logging, and instance-level identity in claims or telemetry.
No. Token exchange is a useful pattern for multi-agent delegation, but support depends on your identity provider and architecture. If your provider does not support RFC 8693 directly, use the closest native delegated-access pattern it offers.
The core OAuth mechanisms are the same. What changes is how you apply them: MCP deployments need clear scope design, audience boundaries for MCP servers, and operational patterns that work for autonomous clients rather than interactive users.
In most cases, no. If your identity provider already supports token exchange or delegated-access flows, use that. A custom exchange service adds security-critical logic that is easy to get wrong.
The patterns above are the difference between a working demo and a production-ready agent platform. Token handling becomes infrastructure quickly: it affects reliability, auditability, blast radius, and incident response.
If your team is deploying MCP servers behind enterprise identity providers, Elegant Software Solutions can help design the token lifecycle, delegation model, and policy controls around your specific environment. The protocol is standardized; the production details are not. If you need help implementing those details, explore our AI implementation engagements.
Discover more content: