Outback Pt. 2: Playwright Browser Automation Lessons

🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6

Teaching an agent to drive a real hotel booking site turned out to be the hardest part of Outback's mid-May build. The core lesson: when no usable public booking API exists, browser automation becomes the integration surface — and the hard part is not writing Playwright scripts. The hard part is surviving the realities of a commercial, React-heavy, anti-bot-protected site without breaking state, losing sessions, or fooling yourself during debugging.

By early June 2026, the practical shape of the solution was clear. Outback needed two things working together: a Playwright session manager that could support a human-handled login and persisted headed session, and a GraphQL availability contract that gave the rest of the system a stable way to ask for live room availability. The engineering pain lived in the seams. Headless detection blocked the obvious approach. Controlled inputs ignored fill(). Re-navigation erased in-progress form state. And one build mistake hid a TypeScript failure long enough to create a stale deployment problem.

This post is the failure-rich part of the arc: what broke, what changed, and which browser automation lessons were expensive enough to be worth writing down.

Browser automation was the only realistic path

TL;DR: When a booking flow has no clean public API, the browser is the API — which means reliability depends on handling UI state, anti-bot behavior, and session persistence as first-class engineering concerns.

The most important architectural decision was accepting the constraint instead of fighting it. For this workflow, there was no clean public hotel booking API that exposed the needed live availability behavior in a way the agent could depend on. That forced the system toward browser automation.

That sounds straightforward until the target is a hardened commercial booking experience rather than a demo app. In practice, the browser is not just a rendering surface. It is where authentication, anti-bot checks, client-side validation, dynamic availability loading, and user state all converge. Once the browser becomes the integration point, the design has to treat it as production infrastructure rather than a disposable script runner.

The implementation that emerged in mid-May had two layers:

Layer	Purpose	Why it mattered
Playwright session manager	Launch a headed browser, support human-handled MFA, persist authenticated session state, and resume that state later	Without this layer, every run started from scratch and repeatedly hit the same login friction
GraphQL availability contract	Give the rest of the agent ecosystem a stable interface for requesting availability data	Without this layer, browser-driving logic leaked into unrelated planning and decision code

The GraphQL contract mattered because browser automation code tends to spread. If every agent action directly manipulates the page, the rest of the system becomes tightly coupled to selectors, timing quirks, and UI changes. A contract boundary contained that damage. The booking-site adapter could own the ugly details while the rest of Outback asked a cleaner question: what availability exists for these dates and constraints?

React's own documentation emphasizes that controlled form elements derive their value from component state rather than raw DOM mutation. That distinction sounds academic until an automation run "types" a date that looks correct in the DOM but never actually updates application state. Similarly, Playwright's official documentation distinguishes headed and headless execution modes, and commercial anti-bot systems routinely treat those modes differently.

The practical takeaway: if the browser is the only path, design around the browser's realities from day one. Don't bolt them on after the first flaky run.

Interactive Playwright debugging beat blind scripts by a mile

TL;DR: The biggest productivity gain came from making browser runs visible and interactive — debugging that took roughly three hours in blind CLI mode shrank to about ten minutes once the live page could be seen and acted on directly.

The single highest-leverage change was moving away from blind command-line script runs and toward interactive browser driving through a Playwright MCP workflow. Before that shift, debugging meant rerunning scripts, reading logs, guessing which selector failed, adding more logs, and trying again. That pattern is survivable on simple pages. It is brutal on a live booking experience with async updates, modals, validation, and anti-bot friction.

Once the browser was visible and interactive, the debugging loop changed completely. It became possible to:

See whether the target page actually finished rendering
Inspect whether a field accepted input or visually rejected it
Confirm whether a click triggered a network-backed state change
Observe when a navigation unexpectedly wiped form state
Distinguish timing failures from selector failures

The result was dramatic. A debugging problem that had consumed roughly three hours in blind runs collapsed to about ten minutes once the session became visible and controllable. That is not a universal ratio, but the directional lesson is strong: for production browser automation, observability is not a luxury.

This is where many automation efforts go wrong. Teams treat browser scripts like backend jobs and expect logs alone to explain failures. But browser-driving failures are often visual and stateful. The page can be "loaded" while the component the script needs is still hydrating. The selector can match an element that exists but is not yet interactable. The click can succeed at the DOM level while the app rejects the action because a hidden validation state has not been satisfied.

A comparison table makes the trade-off clearer:

Approach	Strength	Failure mode
Blind CLI script runs	Easy to automate and repeat	Slow diagnosis, poor visibility into UI state
Interactive headed Playwright session	Fast debugging, direct observation, easier state inspection	Requires more careful session handling and operational discipline
Fully abstracted API integration	Cleaner architecture when available	Not an option when no usable public API exists

The lesson was not "always debug manually." It was "make the system observable enough that manual diagnosis is possible when the browser becomes the battlefield."

Controlled inputs, polling, and the cost of fighting React

TL;DR: React-controlled inputs and live polling behavior broke naive automation assumptions; pressSequentially() worked where fill() failed, and polling had to happen without navigation to avoid wiping user-entered state.

The most annoying bug class came from controlled inputs. On many modern sites — especially React-driven forms — the value that matters is not just the DOM field value. It is the application state managed by the component. That meant a field could look populated while the app still behaved as if nothing had been entered.

In this case, the date input exposed the problem clearly. A naive Playwright fill() call appeared to set the value, but the booking flow did not consistently register it. The reliable path was to interact more like a human typist.

await page.locator('input[name="checkInDate"]').pressSequentially('06/18/2026');
// fill() mutated the DOM value, but the React-controlled input
// did not reliably commit state

That one change sounds small, but it represented a broader lesson: browser automation has to respect the application's event model, not just the shape of the DOM. If the site expects sequential key events, focus changes, or blur handlers to commit state, then shortcuts can create false positives.

The second hard lesson was polling. Availability needed to be refreshed, but the first implementation used re-navigation as a crude polling mechanism. That worked only until it collided with actual user interaction. Re-navigating the page destroyed in-progress input. Dates, filters, or partially entered values disappeared because the page lifecycle restarted.

That forced a change in strategy. Polling had to happen without navigation — the system needed to let the page stay where it was, preserve the user's state, and refresh only the availability-relevant data path or wait for the site's own dynamic updates.

This distinction matters beyond hotel search. Any browser automation workflow that shares space with active user state has to separate "refresh data" from "reset page." They are not equivalent operations.

React's documentation formalizes the controlled-component model: inputs may be driven by state and require event sequences that mirror real interaction. Playwright's API design also reflects this distinction by offering multiple input methods rather than treating all text entry as identical.

Problem	Naive approach	What happened	Better approach
Enter date into controlled input	`fill()`	Value appeared set but app state did not reliably update	`pressSequentially()` to trigger expected key events
Refresh availability	Re-navigate page	In-progress input was wiped	Poll without navigation; preserve current page state
Confirm success	Check only DOM value	False confidence	Validate downstream UI reaction or availability change

This was one of those moments where the browser reminded everyone that "looks correct" is not the same thing as "the application accepted it."

Headless detection changed the runtime model

TL;DR: The anti-bot layer rejected headless browsers, so the agent had to run headed, with a human-assisted login flow and persisted authenticated session treated as sensitive state.

Many browser automation tutorials assume headless execution is the default production mode. That assumption broke immediately here. The site's anti-bot protections blocked headless browsers outright. Once that became clear, the runtime model had to change.

The viable approach was:

Run the browser in headed mode
Let a human complete MFA during login
Persist the authenticated session state
Resume that state for later availability checks until the session degraded or expired

This was not a cute workaround. It was the only reliable path that respected the site's controls rather than trying to bypass them. The bot challenge was never circumvented. The system operated within the authenticated session that a legitimate human established.

That decision had security consequences. A persisted, headed, logged-in browser session is sensitive state. It should be treated much closer to a credential than to a cache file. If an attacker gets the session artifact, they may not need the password or MFA prompt at all.

That leads to three non-negotiable rules:

Never try to bypass the anti-bot challenge.
Keep persisted session artifacts out of the repository.
Treat the live session as a credential with restricted access and careful lifecycle handling.

Playwright's documentation supports session persistence patterns through storage state, but the documentation does not remove the operational risk. The risk comes from what the session represents: an authenticated browser context with real privileges on a real commercial site.

Session practice	Operational convenience	Security posture
Re-login every run	Low convenience	Stronger by default, but often impractical with MFA
Persist session locally with access controls	Moderate convenience	Acceptable when tightly handled and excluded from source control
Commit session artifacts or share loosely	High short-term convenience	Unacceptable

This section is where "production AI agent" stops sounding abstract. Once an agent can drive a live browser against a real service, the browser session itself becomes part of the trust boundary.

The honest failure: `pnpm build | tail` hid a real compile error

TL;DR: One of the most expensive mistakes was not in Playwright at all — piping build output to tail masked a TypeScript failure, which allowed a stale build artifact to keep running until the mismatch was finally noticed.

Not every painful lesson came from the booking site. One came from the build process.

At one point, pnpm build | tail was used as a convenience to trim noisy output and focus on the end of the build log. That convenience turned into a trap. A real TypeScript compile failure occurred upstream, but the piped output obscured the actual tsc error context. The result was a false sense that the build had completed cleanly.

Worse, that mistake combined with stale artifacts in a way that made the system appear more functional than it really was. Code changes were believed to be deployed, but the running output still reflected an older successful build. That meant debugging was happening against the wrong mental model. The browser automation looked inconsistent when the real issue was simpler: the newest code had not actually built.

This kind of failure is worth documenting because it is common in automation-heavy projects. When the runtime is complicated, teams instinctively blame the complicated part first. But sometimes the problem is embarrassingly ordinary: the build failed, and the failure signal got hidden.

A few practical safeguards came out of that incident:

Do not pipe build output in ways that hide compiler context during active debugging
Fail hard on TypeScript errors and verify exit codes explicitly
Confirm artifact freshness before diagnosing runtime behavior
Separate "build noise reduction" from "build correctness"

There is no glamorous engineering lesson here, just an important one. Stale builds are especially dangerous in browser automation because the feedback loop is already noisy. Anything that makes it harder to trust the binary under test multiplies the confusion.

Frequently Asked Questions

Q: Why use Playwright browser automation instead of a hotel API?

In this case, browser automation was necessary because there was no clean public booking API that exposed the required live workflow. When that happens, the browser becomes the integration surface, and reliability depends on handling UI state, authentication, and anti-bot behavior directly.

Q: Why did `fill()` fail on the date field?

The input behaved like a React-controlled component, where application state is updated through expected event sequences rather than simple DOM mutation. fill() could make the field look populated, but pressSequentially() more reliably triggered the key events the application needed to commit the value. This is a well-documented distinction in both React's controlled-component model and Playwright's input API.

Q: Why not run the agent headless in production?

Because the anti-bot layer rejected headless browsers. The reliable approach was a headed browser with a human-assisted login and persisted authenticated session, rather than trying to bypass the site's protection mechanisms. This also kept the system within the site's terms of use.

Q: What is a GraphQL availability contract in this context?

It is a stable interface between the rest of the agent system and the booking-site adapter. Instead of exposing selectors and page logic everywhere, the adapter returns structured availability data through a defined contract, which keeps browser-specific complexity contained and prevents automation details from leaking into planning logic.

Q: What is the biggest security risk in this setup?

The persisted authenticated browser session is the biggest risk because it represents live access to a real account context. It should be stored outside the repository, tightly access-controlled, and handled with the same seriousness as other credentials. If compromised, an attacker could act within the authenticated session without needing the password or MFA.

Key Takeaways

Browser automation becomes the real integration layer when no usable public API exists.
Interactive, visible Playwright sessions dramatically improve debugging speed over blind CLI runs.
React controlled inputs may reject fill() even when the DOM appears correct; pressSequentially() can better match expected user events.
Polling by re-navigation destroys in-progress user state; refresh data without resetting the page.
Headless detection can force a headed runtime model on hardened commercial sites.
Persisted browser sessions are sensitive state and must be treated like credentials.
Build hygiene matters as much as automation logic; hidden compiler failures can waste hours of debugging.

Conclusion

The hardest part of Outback in mid-May was not planning logic or ranking options. It was teaching an agent to survive the messy, stateful, adversarial reality of a live booking site. That work clarified a broader truth about production agents: the interesting engineering often starts where clean APIs end. The next post in the arc introduces the two genuinely new faces on the crew, where browser-driven availability becomes input to a more opinionated decision layer rather than the end of the story.