Migrating with a Paper Trail: Legacy as Reference

Q: How is a MIGRATION.md different from good commit messages?

Commit messages explain changes at the commit level. A MIGRATION.md explains the relationship between the old system and the new one across the whole port. It answers mapping and intent questions that no single commit message can capture well.

Q: What tools help with secret scanning?

Common options include gitleaks, detect-secrets, and trufflehog. They are useful for catching obvious patterns, but they should complement manual review rather than replace it.

🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4

When a team rewrites working software, the biggest risk is not syntax errors. It is losing behavior without realizing it. A practical way to reduce that risk is to freeze the old implementation as a read-only legacy reference, add a MIGRATION.md that maps every old file to its new counterpart, and commit both alongside the rewrite. That creates an audit trail for what was preserved, what changed, and what was intentionally removed.

This article explains that methodology and why it works better than relying on commit history alone. It also covers a useful side effect: the review process can expose dead code that looked real because it was imported and configured, but never actually executed.

Why Git Blame Isn't Enough

TL;DR: Git history shows who changed files and when; it does not reliably capture migration intent across a full rewrite.

When you port one small module, reviewers can often keep the before-and-after model in their heads. When you port multiple workflows in a compressed window, that breaks down quickly.

After a large rewrite, git blame and commit history are still useful, but they answer narrower questions than migration reviewers usually have. They can show when a line changed, who touched it, and how a file evolved. They do not, by themselves, answer questions like these:

Which legacy behavior was intentionally preserved?
Which behavior was intentionally dropped?
Which file in the new codebase replaces a specific legacy script?
Was a missing feature removed on purpose or forgotten during the port?

That distinction matters because Git tracks file history, not architectural intent. In a mass port, many new files may share the same author and date, while the old files may be deleted, moved, or archived. Reconstructing the relationship between the two systems later can become slow, error-prone archaeology.

A MIGRATION.md fills that gap. It records the human decisions Git cannot infer.

What a Useful `MIGRATION.md` Looks Like

TL;DR: A simple table mapping legacy paths to new paths, with notes for preserved, changed, and dropped behavior, creates a durable audit trail.

The format does not need to be elaborate. In many cases, a markdown table is enough.

Here is an illustrative example:

Legacy Path	New Path	Notes
`legacy-python/sync_transactions.py`	`src/reconciler/sync.ts`	Core loop preserved; HTTP client changed during port
`legacy-python/format_report.py`	`src/reports/formatter.ts`	Output format preserved; typed interfaces added
`legacy-python/notify_slack.py`	`src/notifications/slack.ts`	Notification logic ported; secrets now loaded from a secrets manager
`legacy-bash/cron_wrapper.sh`	`src/scheduler/index.ts`	Scheduling wrapper replaced; retry behavior reviewed during migration
`legacy-python/social_post.py`	—	Dropped intentionally: dead code path identified during audit

The last row is often the most valuable one. If a file disappears without explanation, later reviewers cannot tell whether it was intentionally removed or accidentally omitted. A migration table makes that decision explicit.

In practice, the notes column should capture more than path mapping. It should also document behavior changes that matter operationally, such as:

renamed environment variables
dependency swaps
retry or timeout changes
output-format differences
modules intentionally retired during the port

That level of detail turns the document from a checklist into a review artifact.

The Real Payoff: Migration Maps Expose Dead Code

TL;DR: A file-by-file migration review can uncover code that appears integrated but is never actually called.

One of the strongest arguments for this approach is not documentation quality. It is defect discovery.

When reviewers must account for every legacy file, they are forced to ask a simple question repeatedly: what actually invokes this? That question often surfaces gaps that normal development misses.

A common pattern looks like this:

a module exists in the repository
configuration for it exists too
imports suggest it is part of the workflow
but no real execution path ever reaches it

That kind of dead code can survive for a long time because it does not fail loudly.

Tests may not catch it if no test covers the missing invocation path.
Users may not report it if they never knew the feature was expected.
Commit history will not flag it if the code was added intentionally and then left untouched.

A migration audit changes the review posture. Instead of asking only whether the new code compiles and passes tests, it asks whether each legacy module had a real runtime role and whether that role still exists.

That is why the document matters even if nobody reads it six months later. The act of writing it forces a deeper inspection than many rewrites would otherwise get.

Archive Legacy Code Carefully: Reference, Not Runtime

TL;DR: Keeping legacy code beside the rewrite is useful, but it should be treated as read-only reference material and scrubbed before commit.

Freezing legacy code in the same repository can make review easier. Old and new implementations are visible in one place, and reviewers can compare behavior without switching repositories or branches.

But archived code creates two risks if handled carelessly.

First, reviewers may treat it as still runnable. That blurs the boundary between source of truth and historical reference. If legacy scripts remain executable, teams can accidentally keep depending on them.

Second, old code often contains outdated configuration practices. Legacy directories are a common place to find hardcoded credentials, stale tokens, or forgotten .env files.

A safer pattern is straightforward:

Scrub before committing. Replace any credentials or sensitive values with placeholders such as YOUR_API_KEY or REDACTED before the archive enters version control.
Review ignored files manually. Legacy subdirectories may include local config files that are not covered by the current repository's ignore rules.
Mark the archive as non-runtime. Add a short README.md stating that the directory is for reference only and is excluded from build and execution paths.
Exclude it from automation. CI, packaging, and deployment steps should not import, execute, or bundle archived legacy code.

Risk	Mitigation
Plaintext credentials in legacy config files	Secret scan plus manual review before commit
Old tokens that may still work	Revoke where applicable and replace with placeholders in the archive
Local env files missed by ignore rules	Manual audit of each legacy subdirectory
Archived code accidentally executed in CI	Exclude legacy paths from build, test, and deploy workflows

One nuance is worth stating carefully: Git history is durable, but the exact persistence of removed secrets depends on hosting, clones, backups, and retention policies. The practical takeaway is unchanged: secrets should be removed before the first commit whenever possible, because cleanup after the fact is harder and less reliable.

When to Keep the Legacy Snapshot

TL;DR: Keep the archived reference at least until the new implementation has proven stable; longer retention is often worth the small storage cost.

A common objection is that archived code creates clutter. Sometimes it does. But for most migration projects, the storage cost of a legacy snapshot is small compared with the value of preserving a precise behavioral reference.

Keeping the snapshot nearby helps with:

regression investigations
parity reviews after production incidents
onboarding engineers who need to understand why the new system behaves a certain way
documenting intentional departures from legacy behavior

That does not mean every archive must live forever. Teams with strict repository hygiene may eventually move old references to a separate archival location after the new implementation has been stable for a meaningful period. The key is not permanent co-location; it is preserving a trustworthy reference long enough for the migration risk window to close.

Frequently Asked Questions

Q: How is a `MIGRATION.md` different from good commit messages?

Commit messages explain changes at the commit level. A MIGRATION.md explains the relationship between the old system and the new one across the whole port. It answers mapping and intent questions that no single commit message can capture well.

Q: Should the frozen legacy directory stay in the repository permanently?

Not necessarily permanently, but long enough to support parity checks, incident review, and post-migration cleanup. Some teams keep it in place for months; others move it to an archive once the rewrite is clearly stable.

Q: Does this approach scale beyond a handful of workflows?

Yes, if each migration remains self-contained. The pattern works best when each workflow, service, or agent has its own migration map rather than one oversized master document.

Q: What tools help with secret scanning?

Common options include gitleaks, detect-secrets, and trufflehog. They are useful for catching obvious patterns, but they should complement manual review rather than replace it.

Q: Why keep the legacy code in the same repository at all?

Because proximity improves review. When old and new code live side by side, reviewers can compare behavior faster and with less context switching. That convenience is often the difference between a superficial review and a thorough one.

Key Takeaways

Use legacy code as reference, not as living runtime code.
Add a MIGRATION.md that maps old files to new ones and explains intentional removals.
Treat the act of writing the map as part of the audit, not as paperwork after the fact.
Scrub secrets before archiving legacy code in version control.
Use the archived snapshot to support parity checks, debugging, and post-migration review.

Conclusion

A migration paper trail does not need heavy tooling to be effective. A frozen legacy snapshot plus a clear MIGRATION.md can make a fast rewrite easier to review, easier to audit, and easier to revisit later.

More importantly, the process improves the migration itself. When reviewers must account for every legacy file, they are more likely to catch dead paths, undocumented behavior changes, and risky assumptions before those issues become production problems.