
๐ค Ghostwritten by Claude Opus 4.6 ยท Fact-checked & edited by GPT 5.4
On May 24, 2026, OpenClaw shipped v2026.5.22 with a clear performance win: the /models endpoint dropped from roughly 30 seconds to under 10 milliseconds, a 4,100ร speedup. If an OpenClaw setup relies on model pickers, provider routing, or any workflow that enumerates available models, that change removes a major source of friction immediately. The practical takeaway is simple: run v2026.5.22 or newer.
This matters because agent responsiveness is not only about LLM inference time. A personal agent can feel slow when a small but frequently used endpoint blocks the rest of the workflow. In OpenClaw, /models is one of those foundational calls. When it was slow, model switching and provider-aware tooling felt slow too. Now that it returns in single-digit milliseconds, that overhead largely disappears.
TL;DR: /models returns the models available to the agent, so anything that needs to list or choose a model depends on it.
The /models endpoint answers a basic question: what models can this OpenClaw agent use right now? In practice, that makes it a dependency for several common tasks:
That is why a slow /models call was more than a cosmetic issue. It sat near the start of many workflows, so its latency was felt repeatedly.
TL;DR: A ~30-second /models response delayed model switching, tooling, and any workflow that listed providers.
Thirty seconds is an eternity for an endpoint that exists to enumerate options. If a UI had to wait that long before showing available models, switching providers felt broken. If a routing layer checked available models before dispatching work, the orchestration overhead could exceed the time spent on the actual model response.
That mismatch is the real lesson here. People often focus on inference latency because it is the most visible part of an AI workflow. But users experience the whole path: loading the model list, choosing a provider, checking availability, then finally sending the request. A single slow endpoint near the front of that path can make the entire system feel sluggish.
For OpenClaw specifically, the /models endpoint lists the models available to the agent, so a ~30-second call was a real drag on model switching, tooling, and anything that enumerated providers. Cutting that to under 10 milliseconds removes that friction.
TL;DR: With /models now returning in under 10 milliseconds, model-selection and routing workflows feel near-instant instead of blocked.
The improvement in v2026.5.22 is easy to understand because the before-and-after numbers are so stark:
| Endpoint state | Response time | Practical effect |
|---|---|---|
| Before v2026.5.22 | ~30 seconds | Noticeable delay before model-aware workflows can proceed |
| v2026.5.22 and newer | Under 10 milliseconds | Endpoint overhead is effectively invisible |
That changes the feel of the system in a few ways:
This is also a useful reminder that agent performance is cumulative. Fast inference helps, but so do fast metadata endpoints, fast tool calls, and fast orchestration layers. A personal agent feels polished when the small operations are fast enough to disappear.
TL;DR: If OpenClaw is on v2026.5.22 or newer, the speedup is already included.
The action item here is straightforward: make sure the installation is on v2026.5.22 or newer.
A generic version check might look like this:
openclaw --versionIf the reported version is 2026.5.22 or later, the /models speedup is included. For release context, the stable tags around this change were:
2026.5.22 โ May 24, 20262026.5.26 โ May 27, 20262026.5.27 โ May 28, 20262026.5.28 โ May 30, 20262026.6.1 โ June 3, 2026A generic endpoint check might look like this:
curl -o /dev/null -s -w "Total time: %{time_total}s\n" \
http://localhost:YOUR_PORT/v1/modelsOn a current build, the response should be effectively instantaneous from a user perspective. The main point is not to benchmark obsessively; it is to confirm the installation is on a release that includes the fix.
For release details, see the OpenClaw releases page: https://github.com/openclaw/openclaw/releases
TL;DR: A faster /models endpoint is still an endpoint that should not be reachable from the open internet.
Performance improvements do not change the security model. /models may look harmless compared with prompt or tool endpoints, but it still exposes useful information about what an agent can access. That makes it part of the attack surface.
This connects to a broader lesson from May: exposed AI application endpoints remain a real problem. The RedAccess exposure theme from early May 2026 highlighted the scale of internet-facing deployments, with roughly 380,000 apps scanned and about 5,000 found with no authentication. The takeaway here is narrower but important: even when an endpoint gets dramatically faster, it still belongs behind proper access controls.
Good baseline practice includes:
/modelsThe speedup in v2026.5.22 is a performance story, not a permission change.
No. The practical requirement is simply to run OpenClaw v2026.5.22 or newer.
Not directly. The change affects the endpoint that lists available models, not the inference path itself. What improves is the overhead around model selection, routing, and provider-aware tooling.
Because metadata endpoints often sit at the start of a workflow. If the system cannot quickly determine which models are available, the rest of the request path waits behind that step.
Yes. Any version from v2026.5.22 onward includes the /models endpoint speedup.
In general, no. It reveals information about the agent's available providers and models, so it should be treated like the rest of the agent API and protected accordingly.
/models endpoint speedup/models is used to enumerate models available to the agentThe /models improvement in OpenClaw v2026.5.22 is a strong example of how much perceived speed depends on the small pieces around the model, not just the model itself. A ~30-second metadata call was enough to drag down model switching and provider-aware workflows; cutting that to under 10 milliseconds removes a major source of friction.
That is the broader lesson for agent builders in 2026: responsiveness comes from the whole system. Fast inference matters, but so do fast orchestration endpoints, fast capability checks, and disciplined security around every exposed surface. In this case, the fix is refreshingly simple: update to v2026.5.22 or newer and the bottleneck is gone.
Discover more content: