The 4,100× /models Endpoint Speedup in OpenClaw, Explained

Q: If I am on v2026.5.26 or later, do I already have the fix?

Yes. Any version from v2026.5.22 onward includes the /models endpoint speedup.

🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4

On May 24, 2026, OpenClaw shipped v2026.5.22 with a clear performance win: the /models endpoint dropped from roughly 30 seconds to under 10 milliseconds, a 4,100× speedup. If an OpenClaw setup relies on model pickers, provider routing, or any workflow that enumerates available models, that change removes a major source of friction immediately. The practical takeaway is simple: run v2026.5.22 or newer.

This matters because agent responsiveness is not only about LLM inference time. A personal agent can feel slow when a small but frequently used endpoint blocks the rest of the workflow. In OpenClaw, /models is one of those foundational calls. When it was slow, model switching and provider-aware tooling felt slow too. Now that it returns in single-digit milliseconds, that overhead largely disappears.

What the `/models` Endpoint Does

TL;DR: /models returns the models available to the agent, so anything that needs to list or choose a model depends on it.

The /models endpoint answers a basic question: what models can this OpenClaw agent use right now? In practice, that makes it a dependency for several common tasks:

Model-selection interfaces that populate a picker or dropdown
Routing logic that chooses among providers or model tiers
Tooling and integrations that enumerate capabilities during setup
Provider-aware workflows that need to know what is available before sending work

That is why a slow /models call was more than a cosmetic issue. It sat near the start of many workflows, so its latency was felt repeatedly.

Why a 30-Second Call Hurt So Much

TL;DR: A ~30-second /models response delayed model switching, tooling, and any workflow that listed providers.

Thirty seconds is an eternity for an endpoint that exists to enumerate options. If a UI had to wait that long before showing available models, switching providers felt broken. If a routing layer checked available models before dispatching work, the orchestration overhead could exceed the time spent on the actual model response.

That mismatch is the real lesson here. People often focus on inference latency because it is the most visible part of an AI workflow. But users experience the whole path: loading the model list, choosing a provider, checking availability, then finally sending the request. A single slow endpoint near the front of that path can make the entire system feel sluggish.

For OpenClaw specifically, the /models endpoint lists the models available to the agent, so a ~30-second call was a real drag on model switching, tooling, and anything that enumerated providers. Cutting that to under 10 milliseconds removes that friction.

What the 4,100× Speedup Changes in Practice

TL;DR: With /models now returning in under 10 milliseconds, model-selection and routing workflows feel near-instant instead of blocked.

The improvement in v2026.5.22 is easy to understand because the before-and-after numbers are so stark:

Endpoint state	Response time	Practical effect
Before v2026.5.22	~30 seconds	Noticeable delay before model-aware workflows can proceed
v2026.5.22 and newer	Under 10 milliseconds	Endpoint overhead is effectively invisible

That changes the feel of the system in a few ways:

Model switching becomes immediate rather than something that interrupts flow
Routing logic can enumerate models without adding meaningful delay
Provider-aware tools initialize faster because capability discovery is no longer a bottleneck
The agent feels more responsive overall even though the LLM itself did not change

This is also a useful reminder that agent performance is cumulative. Fast inference helps, but so do fast metadata endpoints, fast tool calls, and fast orchestration layers. A personal agent feels polished when the small operations are fast enough to disappear.

How to Check Whether You Have the Fix

TL;DR: If OpenClaw is on v2026.5.22 or newer, the speedup is already included.

The action item here is straightforward: make sure the installation is on v2026.5.22 or newer.

A generic version check might look like this:

openclaw --version

If the reported version is 2026.5.22 or later, the /models speedup is included. For release context, the stable tags around this change were:

2026.5.22 — May 24, 2026
2026.5.26 — May 27, 2026
2026.5.27 — May 28, 2026
2026.5.28 — May 30, 2026
2026.6.1 — June 3, 2026

A generic endpoint check might look like this:

curl -o /dev/null -s -w "Total time: %{time_total}s\n" \
  http://localhost:YOUR_PORT/v1/models

On a current build, the response should be effectively instantaneous from a user perspective. The main point is not to benchmark obsessively; it is to confirm the installation is on a release that includes the fix.

For release details, see the OpenClaw releases page: https://github.com/openclaw/openclaw/releases

Faster Does Not Mean Public

TL;DR: A faster /models endpoint is still an endpoint that should not be reachable from the open internet.

Performance improvements do not change the security model. /models may look harmless compared with prompt or tool endpoints, but it still exposes useful information about what an agent can access. That makes it part of the attack surface.

This connects to a broader lesson from May: exposed AI application endpoints remain a real problem. The RedAccess exposure theme from early May 2026 highlighted the scale of internet-facing deployments, with roughly 380,000 apps scanned and about 5,000 found with no authentication. The takeaway here is narrower but important: even when an endpoint gets dramatically faster, it still belongs behind proper access controls.

Good baseline practice includes:

Keep agent APIs off the public internet unless there is a deliberate, secured reason to expose them
Require authentication for every endpoint, including metadata endpoints like /models
Prefer localhost or private-network access for self-hosted setups
Review firewall and reverse-proxy rules so convenience does not become exposure

The speedup in v2026.5.22 is a performance story, not a permission change.

Frequently Asked Questions

Q: Do I need to change any configuration to get the 4,100× `/models` speedup?

No. The practical requirement is simply to run OpenClaw v2026.5.22 or newer.

Q: Does this make model responses faster too?

Not directly. The change affects the endpoint that lists available models, not the inference path itself. What improves is the overhead around model selection, routing, and provider-aware tooling.

Q: Why does a metadata endpoint matter so much?

Because metadata endpoints often sit at the start of a workflow. If the system cannot quickly determine which models are available, the rest of the request path waits behind that step.

Q: If I am on v2026.5.26 or later, do I already have the fix?

Yes. Any version from v2026.5.22 onward includes the /models endpoint speedup.

Q: Should `/models` ever be exposed publicly for discovery?

In general, no. It reveals information about the agent's available providers and models, so it should be treated like the rest of the agent API and protected accordingly.

Key Takeaways

OpenClaw v2026.5.22 shipped on May 24, 2026 with a 4,100× /models endpoint speedup
The endpoint went from ~30 seconds to under 10 milliseconds
That matters because /models is used to enumerate models available to the agent
The improvement directly benefits model switching, tooling, and routing workflows
The practical action item is simple: run v2026.5.22 or newer
Faster endpoints still need proper protection: do not expose agent APIs to the open internet

Conclusion

The /models improvement in OpenClaw v2026.5.22 is a strong example of how much perceived speed depends on the small pieces around the model, not just the model itself. A ~30-second metadata call was enough to drag down model switching and provider-aware workflows; cutting that to under 10 milliseconds removes a major source of friction.

That is the broader lesson for agent builders in 2026: responsiveness comes from the whole system. Fast inference matters, but so do fast orchestration endpoints, fast capability checks, and disciplined security around every exposed surface. In this case, the fix is refreshingly simple: update to v2026.5.22 or newer and the bottleneck is gone.