
🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6
Teaching an agent to drive a real hotel booking site turned out to be the hardest part of Outback's mid-May build. The core lesson: when no usable public booking API exists, browser automation becomes the integration surface — and the hard part is not writing Playwright scripts. The hard part is surviving the realities of a commercial, React-heavy, anti-bot-protected site without breaking state, losing sessions, or fooling yourself during debugging.
By early June 2026, the practical shape of the solution was clear. Outback needed two things working together: a Playwright session manager that could support a human-handled login and persisted headed session, and a GraphQL availability contract that gave the rest of the system a stable way to ask for live room availability. The engineering pain lived in the seams. Headless detection blocked the obvious approach. Controlled inputs ignored fill(). Re-navigation erased in-progress form state. And one build mistake hid a TypeScript failure long enough to create a stale deployment problem.
This post is the failure-rich part of the arc: what broke, what changed, and which browser automation lessons were expensive enough to be worth writing down.
TL;DR: When a booking flow has no clean public API, the browser is the API — which means reliability depends on handling UI state, anti-bot behavior, and session persistence as first-class engineering concerns.
The most important architectural decision was accepting the constraint instead of fighting it. For this workflow, there was no clean public hotel booking API that exposed the needed live availability behavior in a way the agent could depend on. That forced the system toward browser automation.
That sounds straightforward until the target is a hardened commercial booking experience rather than a demo app. In practice, the browser is not just a rendering surface. It is where authentication, anti-bot checks, client-side validation, dynamic availability loading, and user state all converge. Once the browser becomes the integration point, the design has to treat it as production infrastructure rather than a disposable script runner.
The implementation that emerged in mid-May had two layers:
| Layer | Purpose | Why it mattered |
|---|---|---|
| Playwright session manager | Launch a headed browser, support human-handled MFA, persist authenticated session state, and resume that state later | Without this layer, every run started from scratch and repeatedly hit the same login friction |
| GraphQL availability contract | Give the rest of the agent ecosystem a stable interface for requesting availability data | Without this layer, browser-driving logic leaked into unrelated planning and decision code |
The GraphQL contract mattered because browser automation code tends to spread. If every agent action directly manipulates the page, the rest of the system becomes tightly coupled to selectors, timing quirks, and UI changes. A contract boundary contained that damage. The booking-site adapter could own the ugly details while the rest of Outback asked a cleaner question: what availability exists for these dates and constraints?
React's own documentation emphasizes that controlled form elements derive their value from component state rather than raw DOM mutation. That distinction sounds academic until an automation run "types" a date that looks correct in the DOM but never actually updates application state. Similarly, Playwright's official documentation distinguishes headed and headless execution modes, and commercial anti-bot systems routinely treat those modes differently.
The practical takeaway: if the browser is the only path, design around the browser's realities from day one. Don't bolt them on after the first flaky run.
TL;DR: The biggest productivity gain came from making browser runs visible and interactive — debugging that took roughly three hours in blind CLI mode shrank to about ten minutes once the live page could be seen and acted on directly.
The single highest-leverage change was moving away from blind command-line script runs and toward interactive browser driving through a Playwright MCP workflow. Before that shift, debugging meant rerunning scripts, reading logs, guessing which selector failed, adding more logs, and trying again. That pattern is survivable on simple pages. It is brutal on a live booking experience with async updates, modals, validation, and anti-bot friction.
Once the browser was visible and interactive, the debugging loop changed completely. It became possible to:
The result was dramatic. A debugging problem that had consumed roughly three hours in blind runs collapsed to about ten minutes once the session became visible and controllable. That is not a universal ratio, but the directional lesson is strong: for production browser automation, observability is not a luxury.
This is where many automation efforts go wrong. Teams treat browser scripts like backend jobs and expect logs alone to explain failures. But browser-driving failures are often visual and stateful. The page can be "loaded" while the component the script needs is still hydrating. The selector can match an element that exists but is not yet interactable. The click can succeed at the DOM level while the app rejects the action because a hidden validation state has not been satisfied.
A comparison table makes the trade-off clearer:
| Approach | Strength | Failure mode |
|---|---|---|
| Blind CLI script runs | Easy to automate and repeat | Slow diagnosis, poor visibility into UI state |
| Interactive headed Playwright session | Fast debugging, direct observation, easier state inspection | Requires more careful session handling and operational discipline |
| Fully abstracted API integration | Cleaner architecture when available | Not an option when no usable public API exists |
The lesson was not "always debug manually." It was "make the system observable enough that manual diagnosis is possible when the browser becomes the battlefield."
TL;DR: React-controlled inputs and live polling behavior broke naive automation assumptions; pressSequentially() worked where fill() failed, and polling had to happen without navigation to avoid wiping user-entered state.
The most annoying bug class came from controlled inputs. On many modern sites — especially React-driven forms — the value that matters is not just the DOM field value. It is the application state managed by the component. That meant a field could look populated while the app still behaved as if nothing had been entered.
In this case, the date input exposed the problem clearly. A naive Playwright fill() call appeared to set the value, but the booking flow did not consistently register it. The reliable path was to interact more like a human typist.
await page.locator('input[name="checkInDate"]').pressSequentially('06/18/2026');
// fill() mutated the DOM value, but the React-controlled input
// did not reliably commit stateThat one change sounds small, but it represented a broader lesson: browser automation has to respect the application's event model, not just the shape of the DOM. If the site expects sequential key events, focus changes, or blur handlers to commit state, then shortcuts can create false positives.
The second hard lesson was polling. Availability needed to be refreshed, but the first implementation used re-navigation as a crude polling mechanism. That worked only until it collided with actual user interaction. Re-navigating the page destroyed in-progress input. Dates, filters, or partially entered values disappeared because the page lifecycle restarted.
That forced a change in strategy. Polling had to happen without navigation — the system needed to let the page stay where it was, preserve the user's state, and refresh only the availability-relevant data path or wait for the site's own dynamic updates.
This distinction matters beyond hotel search. Any browser automation workflow that shares space with active user state has to separate "refresh data" from "reset page." They are not equivalent operations.
React's documentation formalizes the controlled-component model: inputs may be driven by state and require event sequences that mirror real interaction. Playwright's API design also reflects this distinction by offering multiple input methods rather than treating all text entry as identical.
| Problem | Naive approach | What happened | Better approach |
|---|---|---|---|
| Enter date into controlled input | fill() |
Value appeared set but app state did not reliably update | pressSequentially() to trigger expected key events |
| Refresh availability | Re-navigate page | In-progress input was wiped | Poll without navigation; preserve current page state |
| Confirm success | Check only DOM value | False confidence | Validate downstream UI reaction or availability change |
This was one of those moments where the browser reminded everyone that "looks correct" is not the same thing as "the application accepted it."
TL;DR: The anti-bot layer rejected headless browsers, so the agent had to run headed, with a human-assisted login flow and persisted authenticated session treated as sensitive state.
Many browser automation tutorials assume headless execution is the default production mode. That assumption broke immediately here. The site's anti-bot protections blocked headless browsers outright. Once that became clear, the runtime model had to change.
The viable approach was:
This was not a cute workaround. It was the only reliable path that respected the site's controls rather than trying to bypass them. The bot challenge was never circumvented. The system operated within the authenticated session that a legitimate human established.
That decision had security consequences. A persisted, headed, logged-in browser session is sensitive state. It should be treated much closer to a credential than to a cache file. If an attacker gets the session artifact, they may not need the password or MFA prompt at all.
That leads to three non-negotiable rules:
Playwright's documentation supports session persistence patterns through storage state, but the documentation does not remove the operational risk. The risk comes from what the session represents: an authenticated browser context with real privileges on a real commercial site.
| Session practice | Operational convenience | Security posture |
|---|---|---|
| Re-login every run | Low convenience | Stronger by default, but often impractical with MFA |
| Persist session locally with access controls | Moderate convenience | Acceptable when tightly handled and excluded from source control |
| Commit session artifacts or share loosely | High short-term convenience | Unacceptable |
This section is where "production AI agent" stops sounding abstract. Once an agent can drive a live browser against a real service, the browser session itself becomes part of the trust boundary.
TL;DR: One of the most expensive mistakes was not in Playwright at all — piping build output to tail masked a TypeScript failure, which allowed a stale build artifact to keep running until the mismatch was finally noticed.
Not every painful lesson came from the booking site. One came from the build process.
At one point, pnpm build | tail was used as a convenience to trim noisy output and focus on the end of the build log. That convenience turned into a trap. A real TypeScript compile failure occurred upstream, but the piped output obscured the actual tsc error context. The result was a false sense that the build had completed cleanly.
Worse, that mistake combined with stale artifacts in a way that made the system appear more functional than it really was. Code changes were believed to be deployed, but the running output still reflected an older successful build. That meant debugging was happening against the wrong mental model. The browser automation looked inconsistent when the real issue was simpler: the newest code had not actually built.
This kind of failure is worth documenting because it is common in automation-heavy projects. When the runtime is complicated, teams instinctively blame the complicated part first. But sometimes the problem is embarrassingly ordinary: the build failed, and the failure signal got hidden.
A few practical safeguards came out of that incident:
There is no glamorous engineering lesson here, just an important one. Stale builds are especially dangerous in browser automation because the feedback loop is already noisy. Anything that makes it harder to trust the binary under test multiplies the confusion.
In this case, browser automation was necessary because there was no clean public booking API that exposed the required live workflow. When that happens, the browser becomes the integration surface, and reliability depends on handling UI state, authentication, and anti-bot behavior directly.
The input behaved like a React-controlled component, where application state is updated through expected event sequences rather than simple DOM mutation. fill() could make the field look populated, but pressSequentially() more reliably triggered the key events the application needed to commit the value. This is a well-documented distinction in both React's controlled-component model and Playwright's input API.
Because the anti-bot layer rejected headless browsers. The reliable approach was a headed browser with a human-assisted login and persisted authenticated session, rather than trying to bypass the site's protection mechanisms. This also kept the system within the site's terms of use.
It is a stable interface between the rest of the agent system and the booking-site adapter. Instead of exposing selectors and page logic everywhere, the adapter returns structured availability data through a defined contract, which keeps browser-specific complexity contained and prevents automation details from leaking into planning logic.
The persisted authenticated browser session is the biggest risk because it represents live access to a real account context. It should be stored outside the repository, tightly access-controlled, and handled with the same seriousness as other credentials. If compromised, an attacker could act within the authenticated session without needing the password or MFA.
fill() even when the DOM appears correct; pressSequentially() can better match expected user events.The hardest part of Outback in mid-May was not planning logic or ranking options. It was teaching an agent to survive the messy, stateful, adversarial reality of a live booking site. That work clarified a broader truth about production agents: the interesting engineering often starts where clean APIs end. The next post in the arc introduces the two genuinely new faces on the crew, where browser-driven availability becomes input to a more opinionated decision layer rather than the end of the story.
Discover more content: