
🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4
A rotating refresh token can break an otherwise healthy OAuth integration in two predictable ways: failing to persist the newly issued refresh token, and reading a stale token back from cache after the write. Both failures often surface as invalid_grant, which makes the problem look like a revoked app or bad client credentials when the real issue is state management. If a provider rotates refresh tokens on each successful refresh, the old token may become unusable immediately or within a short grace window, depending on the provider. Either way, the safe implementation pattern is the same: persist the replacement token durably, treat the write as part of the refresh transaction, and ensure subsequent reads bypass or invalidate any stale cache.
This article walks through those two failure modes, why they are easy to misdiagnose, and the implementation rules that reduce both lockouts and token-handling risk.
TL;DR: If a provider rotates refresh tokens and your code does not persist the replacement token, the next refresh can fail with invalid_grant even though your client credentials are correct.
Many OAuth 2.0 integrations are built around a simple assumption: store one refresh token, reuse it until it expires or is revoked, and periodically exchange it for fresh access tokens. That model still works with some providers, but it is not universal.
A growing number of authorization servers use refresh token rotation. In that model, a successful token refresh returns a new refresh token, and the previously used token is no longer meant to be reused. The OAuth 2.0 Security Best Current Practice, published as RFC 9700 in January 2025, recommends refresh token rotation or other sender-constrained approaches for public clients and other higher-risk scenarios because they reduce the value of a stolen refresh token.
The implementation trap is straightforward: if the refresh flow treats the refresh token as read-only, the application can successfully obtain a new access token while discarding the only refresh token that will work on the next cycle. The next scheduled refresh then submits an outdated token and receives invalid_grant.
That error is easy to misread. It can also appear when a user revokes consent, when the authorization server rejects the grant for policy reasons, or when the submitted token is otherwise invalid. In practice, that means teams often start by checking scopes, client secrets, and provider status pages instead of checking whether the stored refresh token was actually updated.
TL;DR: Treat token refresh as a state-changing operation: persist the new refresh token durably before relying on the refreshed session.
The core fix is not complicated, but it does require discipline. A refresh operation is not just a read from storage followed by an HTTP call. It is a state transition. If the provider returns a replacement refresh token, that token becomes part of the system's durable authentication state.
Here is sanitized pseudo-code for the safer pattern:
async def refresh_access_token(secret_store, provider_token_url):
current_refresh = await secret_store.read("integration/refresh-token")
response = await http_post(provider_token_url, {
"grant_type": "refresh_token",
"refresh_token": current_refresh,
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
})
new_access_token = response["access_token"]
new_refresh_token = response.get("refresh_token")
if new_refresh_token:
await secret_store.write("integration/refresh-token", new_refresh_token)
await secret_store.write("integration/access-token", new_access_token)
return new_access_tokenTwo details matter here.
First, not every provider returns a new refresh token on every refresh. Some return one only in certain circumstances, and some do not rotate at all. That is why the example checks for response.get("refresh_token") instead of assuming the field is always present. The correct behavior depends on the provider's documentation.
Second, durability matters more than sequencing alone. In a production system, the write of the replacement token should be confirmed before downstream work depends on the refreshed session. If the process crashes after the provider has issued a replacement token but before local state is updated, recovery becomes provider-specific. Some providers invalidate the old token immediately; others allow a brief reuse interval. The safest assumption is that the old token should no longer be trusted.
Where possible, teams should also serialize concurrent refresh attempts for the same credential set. Two workers refreshing the same token at nearly the same time can create a race in which one worker persists a token that another worker has already superseded.
TL;DR: Even after a correct write, a local cache can keep returning an outdated refresh token unless the cache is bypassed or invalidated.
The second failure mode is more frustrating because the write path can be correct while the read path is still wrong. A secret manager client, SDK, or wrapper may cache values in memory or on disk to reduce latency and API calls. That is often fine for static configuration. It is risky for credentials that can rotate on use.
If a refresh token is updated in the source of truth but the next read comes from a stale cache entry, the application behaves as if the write never happened. The result is the same: the next refresh attempt can submit an outdated token and fail.
The exact time-to-live is implementation-specific. Some SDKs cache aggressively, some do not cache at all, and some leave caching to the application layer. Because of that, the safe guidance is architectural rather than vendor-specific: do not assume a secret read immediately reflects a prior secret write unless the storage layer or client explicitly guarantees read-after-write behavior for that key path.
A simplified failure timeline looks like this:
| Time | Event | Cache State | Source of Truth |
|---|---|---|---|
| T+0 | Refresh succeeds and returns replacement token | Old token still cached | New token stored |
| T+1 | Application reads refresh token again | Cache returns old token | New token available |
| T+N | Next refresh attempt uses stale token | Cache or local state still stale | Old token rejected |
The practical fix is to invalidate or bypass the cache for rotating secrets:
await secret_store.write("integration/refresh-token", new_refresh_token)
secret_store.invalidate_cache("integration/refresh-token")If the client library does not support targeted invalidation, the safer alternative is to disable caching for refresh-token reads entirely. Access tokens and static configuration can still be cached under tighter rules; rotating refresh tokens usually should not be.
TL;DR: Refresh token rotation improves security, but only if applications store, read, and redact tokens correctly.
Refresh token rotation exists to reduce replay value. If an attacker steals a refresh token, rotation can limit how long that token remains useful. But the security benefit depends on correct implementation.
Three practices matter most:
refresh_token values.One nuance is worth calling out: a revoked or rotated token is not necessarily "usable" just because it remains in a cache. In many systems, the authorization server will reject it. The real risk is that the application keeps attempting to use invalid credentials, prolongs outages, and may delay remediation during a security event because operators believe the system has already switched to the new state.
That is why rotating-token integrations should be designed as write-through, read-consistent systems. The source of truth for the refresh token must be unambiguous, and every component that reads it must see the latest value quickly enough to avoid reuse of stale state.
Refresh token rotation is an OAuth security pattern in which the authorization server issues a replacement refresh token during a refresh flow instead of expecting the same token to be reused indefinitely. RFC 9700 recommends rotation or sender-constrained refresh tokens as stronger defenses against token replay, but provider behavior varies. Some rotate on every refresh, some rotate conditionally, and some still use stable refresh tokens.
Because invalid_grant usually refers to the grant material being rejected, not necessarily the client authentication failing. A stale, expired, revoked, malformed, or already-used refresh token can all trigger it. When debugging, compare the token being submitted with the most recently persisted token and check the provider's rules for token reuse and grace periods.
Usually not if the provider rotates them or can revoke them asynchronously. The performance gain is small compared with the operational risk. A better pattern is to cache short-lived access tokens conservatively while reading refresh tokens from the source of truth on demand.
That is a common source of hard-to-reproduce failures. If multiple workers can refresh the same token set, they need coordination such as a lock, lease, or single-flight refresh mechanism. Without it, one worker can overwrite or invalidate state another worker is still using.
Use explicit field-level redaction for refresh_token, access_token, id_token, and authorization headers. Also review HTTP client middleware, exception handlers, and observability tools that capture request or response bodies by default. Generic PII filters are often not enough for OAuth payloads.
invalid_grant is often a state problem, not a client-credential problem. Check token lifecycle handling before assuming revocation or misconfiguration.The hardest OAuth bugs are often not protocol bugs at all. They are state-management bugs: a replacement token that was never persisted, a cache that returned yesterday's truth, or two workers racing to refresh the same credential. Refresh token rotation is increasingly common because it improves security, but it also raises the bar for implementation quality. Teams that treat refresh as a transactional state change rather than a simple HTTP exchange are far less likely to be surprised by invalid_grant in production.
Discover more content: