The Reviewer's Toolkit: How to Babysit AI Code Effectively

🤖 Ghostwritten by Claude Opus 4.5 · Edited by GPT-5.2 Codex · Curated by Tom Hundley

This article was written by Claude Opus 4.5, fact-checked by GPT-5.2 Codex, and curated for publication by Tom Hundley.

This is Part 3 of the Professional's Guide to Vibe Coding series. Start with Part 1 if you haven't already.

Reviewing AI Code in Vibe Coding Is Different

Human code review and AI code review require different mindsets.

When reviewing human code, you're looking for intentions that didn't translate correctly—the developer knew what they wanted but made mistakes in expression. You're checking logic, style, and overlooked edge cases.

When reviewing AI code, you're looking for confident incorrectness—code that reads well, follows patterns, and is fundamentally wrong. AI makes different mistakes than humans, and catching them requires different vigilance.

This article is the checklist I've developed after reviewing thousands of AI-generated code blocks.

The AI Code Review Checklist

Category 1: Security

AI code is often insecure by default. Not maliciously—just naively.

Check for input validation:

// AI often generates this:
app.get('/user/:id', (req, res) => {
  const user = db.query(`SELECT * FROM users WHERE id = ${req.params.id}`);
});

// When it should be this:
app.get('/user/:id', (req, res) => {
  const id = parseInt(req.params.id, 10);
  if (isNaN(id)) return res.status(400).json({ error: 'Invalid ID' });
  const user = db.query('SELECT * FROM users WHERE id = $1', [id]);
});

Questions to ask:

Is user input being interpolated directly into queries or commands?
Are file paths being validated before access?
Are secrets hardcoded or exposed in logs?
Is authentication checked before authorization?
Are rate limits implemented for public endpoints?

Category 2: Hallucinated Dependencies

AI frequently invents packages that don't exist or uses APIs that have been deprecated.

Common patterns:

Package names that sound right but don't exist in npm/PyPI
Method signatures from old versions of libraries
API endpoints from outdated documentation
Configuration options that were removed years ago

Verification steps:

Check if the package exists: npm view <package> or search PyPI
Check if the imported method exists in current version docs
Verify API endpoints in official documentation
Test imports before building logic on top

Category 3: Logic Errors

AI excels at producing code that looks correct but fails on edge cases.

Watch for:

Off-by-one errors in loops and slices
Incorrect null/undefined handling
Race conditions in async code
Comparison operators that fail for edge values

Example:

// AI generated:
function getLastItem(arr) {
  return arr[arr.length];  // Off by one—returns undefined
}

// Should be:
function getLastItem(arr) {
  return arr.length > 0 ? arr[arr.length - 1] : undefined;
}

Category 4: Architecture Mismatches

AI doesn't know your system's architecture. It generates locally reasonable code that may conflict globally.

Questions to ask:

Does this follow the patterns established elsewhere in the codebase?
Is it creating coupling that will make future changes hard?
Does the abstraction level match similar features?
Is it introducing inconsistent naming or structure?

Category 5: Missing Error Handling

AI tends toward "happy path" code. It often skips:

Try/catch blocks around I/O operations
Fallbacks for failed network requests
Validation of external data
Graceful degradation for missing dependencies

Standard practice: After AI generates code, explicitly ask: "What are all the ways this could fail?" Then verify those cases are handled.

The Hallucination Detection Protocol

Hallucinations are AI's most insidious failure mode. The code compiles. It runs. It just does something subtly wrong based on fabricated information.

Signs of Hallucination

Confident specificity about unknown things:

Very specific version numbers (e.g., "as of version 3.4.7")
Detailed API signatures you can't verify
"Best practices" you've never heard of from authoritative sources

Made-up documentation references:

Links to documentation pages that don't exist
Citations of blog posts that return 404
References to configuration files with invented schema

Plausible but fictional features:

Methods that would make sense but don't exist
Configuration options that seem reasonable but aren't supported
Integrations between tools that don't actually work together

Verification Steps

Don't trust—verify. Before building on any AI claim, check primary sources.
Check version alignment. Is the AI using information from the correct library version?
Test in isolation. Before integrating, verify the specific feature works.
When in doubt, ask explicitly. "Is this a real feature or are you uncertain?"

The Context Drift Problem

Over long conversations or complex tasks, AI loses coherence. This manifests as:

Symptoms of drift:

Contradictory changes to the same file
Forgetting project structure established earlier
Repeating the same mistake after you corrected it
Generating code that conflicts with earlier generations

Mitigation strategies:

Use shorter sessions. Start fresh for each major task.
Provide explicit context. Don't rely on conversation history; restate important constraints.
Checkpoint frequently. Test and commit working code before continuing.
Recognize the signs. When coherence breaks down, it's time to restart.

The 60-Second Triage

Not every AI generation needs deep review. Here's my fast-pass checklist:

30 seconds—structural scan:

Are imports real and necessary?
Does the overall structure match expectations?
Are there obvious security patterns missing?

30 seconds—logic scan:

Do loops have correct bounds?
Is error handling present?
Are edge cases addressed?

If the triage passes, proceed to deeper review. If it fails on any point, it's often faster to regenerate with a better prompt than to fix the output.

When to Read Diffs vs. Test Behavior

Read the diff when:

The change is small and targeted
You're reviewing security-critical code
You need to understand the implementation for future maintenance
The AI is working in an area you don't know well

Test behavior when:

The change is large and complex
You're prototyping and correctness matters more than understanding
Time pressure is high and the risk of failure is low
The code is throwaway (tests, demos, experiments)

Most production code requires both: test that it works, then review to understand why.

The Security Review Addendum

Because AI doesn't think adversarially, security review requires explicit attention:

Always check for:

SQL injection via string interpolation
Command injection in shell executions
Path traversal in file operations
Cross-site scripting in rendered output
Insecure deserialization
Missing authentication on endpoints
Hardcoded secrets or credentials
Overly permissive CORS or permissions

Assume AI code is insecure until you've explicitly verified each attack surface.

Building Your Review Workflow

Here's the workflow I use:

Generate with a clear, scoped prompt
Triage in 60 seconds—reject and regenerate if structurally wrong
Deep review using the checklist categories above
Test behavior before committing
Commit with a clear message that notes AI assistance
Document any non-obvious decisions for future maintainers

The workflow becomes automatic with practice. The first few months feel slow; eventually it's faster than writing code directly.

The Bottom Line

Reviewing AI code is a skill distinct from writing code or reviewing human code. It requires:

Systematic checking for AI-specific failure modes
Healthy skepticism about confident-sounding claims
Explicit security review that AI won't do for you
Recognition of context drift and when to restart

The checklist in this article isn't exhaustive—you'll develop your own patterns. But it's a foundation that catches most issues before they reach production.

Next in the series: Building Intuition: What AI Gets Wrong (How to Predict It)

Ready to level up your team's AI development practices?

Elegant Software Solutions offers hands-on training that takes you from AI-curious to AI-proficient—with the professional discipline that production systems require.

👉 Book a consultation