
๐ค Ghostwritten by Claude Opus 4.6 ยท Fact-checked & edited by GPT 5.4 ยท Curated by Tom Hundley
If you run a multi-agent platform, naming is not cosmetic. It affects incident response, onboarding, observability, and how quickly engineers can understand the system under pressure. Our rule for the Elegant Software Solutions agent platform rebuild is simple: business names in code, codenames for humans.
In practice, that means every repo, log line, metric tag, database table, and API route uses a descriptive business name such as ess-email-triage-agent, not Soundwave. Codenames like Soundwave, Sparkles, and Optimus Prime still exist, but only as conversational shorthand. They do not appear in infrastructure.
That distinction sounds small until something breaks. When names differ across repos, logs, dashboards, and alerts, engineers waste time translating instead of diagnosing. This post explains why we adopted the rule, what the old pattern cost us, and how we enforce it.
TL;DR: Playful codenames helped team culture, but inconsistent identifiers made incidents, log analysis, and onboarding harder than they needed to be.
Before the cleanup, our naming was inconsistent across systems:
| Repo Name | Codename | Group Theme | What It Actually Does |
|---|---|---|---|
ess-agents-slack-sparkles |
Optimus Prime | Autobot Command | Slack operator surface |
ess-agent-soundwave |
Soundwave | Communications | Email triage and drafting |
ess-agent-harvest |
Shockwave | Finance | Bookkeeping automation |
ess-agent-concierge |
Bumblebee | Scouts | General-purpose assistant |
ess-agent-orchestrator |
Ultra Magnus | Command | Fleet orchestration |
ess-agent-insurance |
Ratchet | Support | Insurance monitoring |
It was memorable. It was also operationally expensive.
Picture this: the orchestrator reports all agents healthy, but email triage has processed nothing in four hours. In Slack, the dashboard says "Soundwave: OK." Logs include soundwave-worker, email-triage-main, and ess-agent-soundwave for the same system depending on which layer emitted the event. Sentry references soundwave. Metrics use email_triage_agent. The launchd plist uses yet another identifier.
That kind of inconsistency slows response because engineers first have to answer a basic question: what system are we even talking about? When I wrote about how we stopped building agents and restarted the platform, naming confusion was one of the structural problems that made the old fleet harder to operate than it should have been.
One inconsistent name is annoying. A dozen inconsistent names across repos, logs, metrics, Sentry projects, launchd plists, Slack commands, and database tables creates a much larger coordination problem. In distributed systems practice, consistent service naming is a basic observability and operability requirement because dashboards, alerts, traces, and runbooks all depend on shared identifiers. We were violating that principle at multiple layers.
TL;DR: Business names are the system of record in machine-readable contexts; codenames are a human-only alias layer that never appears in infrastructure.
Here is the concrete rule from our file-based operating model:
Rule 4: Business Names In Code, Codenames For Humans
Operational clarity wins in code, file paths, logs, APIs, and infrastructure.
Human-facing codenames are allowed, but they are secondary.
ess-agent-platform, ess-email-triage-agentservice=ess-email-triageagent_name="ess-email-triage"ess-email-triageemail_triage_runs, not soundwave_runscom.ess.agent.email-triage/agents/email-triage/statusESS_EMAIL_TRIAGE_API_KEY"ess-email-triage failed to connect to IMAP"codename field as metadata, not as a primary keyWe keep a single canonical mapping file in the monorepo:
# agents/naming.yaml
agents:
ess-slack-operator:
codename: Sparkles
transformer_alias: Optimus Prime
description: Slack-based operator control surface
group: platform-kernel
ess-email-triage:
codename: Soundwave
transformer_alias: Soundwave
description: Email triage, classification, and draft responses
group: communications
ess-orchestrator:
codename: Orchestrator
transformer_alias: Ultra Magnus
description: Fleet health, heartbeat tracking, task routing
group: platform-kernel
ess-bookkeeping:
codename: Harvest
transformer_alias: Shockwave
description: QuickBooks Online reconciliation and categorization
group: financeThis file is the single source of truth. If someone needs to translate between codename and business name, there is exactly one place to look.
TL;DR: Consistent business naming supports monorepo clarity because you cannot consolidate services cleanly if their identities change from tool to tool.
The naming convention is not a standalone decision. It is tightly coupled to the monorepo restart strategy we are executing. When everything lives in ess-agent-platform, the directory structure needs to be immediately legible:
ess-agent-platform/
โโโ platform/
โ โโโ control-plane/
โ โโโ worker-runtime/
โ โโโ operator-surface/
โโโ agents/
โ โโโ ess-email-triage/
โ โโโ ess-bookkeeping/
โ โโโ ess-slack-operator/
โ โโโ ess-insurance-monitor/
โโโ adapters/
โ โโโ slack/
โ โโโ email/
โ โโโ imessage/
โโโ config/
โโโ naming.yamlA new engineer looking at this tree can infer what each agent does. Compare that with a layout full of codenames that only make sense if you already know the backstory.
Our restart plan says we stay monorepo until the system earns the right to split. That means the control plane, worker contract, and operator surface need to stabilize before any agent gets its own repo. Business naming helps because you can refactor directory boundaries without also refactoring every log line, metric, and service identifier that used a codename.
TL;DR: The hardest part was not renaming code. It was finding every place codenames had leaked into infrastructure and changing habits without creating avoidable outages.
I expected the grep-and-replace to be the hard part. It was not. Here is what actually tripped us up.
We found codename references in launchd plists, database schemas, Sentry project names, and log format strings. Each one required careful sequencing to avoid breaking a running agent.
For security reasons, I am intentionally not listing real secret paths, project identifiers, or internal infrastructure values here. The important point is the pattern: naming leaks into more places than most teams remember.
The team, especially me, kept saying "Sparkles" in Slack during incidents. Old habits are hard to break. We did not ban codenames in conversation because that would have been annoying and unnecessary. Instead, we made the naming.yaml translation file easy to find and treated business names as mandatory everywhere machines read or emit identifiers.
The first time I got a Sentry alert that said ess-email-triage: IMAP connection timeout instead of Soundwave: connection error, the difference was immediate. I knew what system was affected, what dependency was failing, and where to start looking. Removing the translation step made the alert more actionable.
TL;DR: Naming conventions are an architectural choice because they shape how quickly people can reason about the system when something goes wrong.
This naming decision is part of a larger principle in the rebuild: operational clarity is not decoration; it is architecture. The same thinking drives our file-based memory system, where we decided that if a decision is not written to a tracked file, it is not durable enough to rely on.
Similarly, if a service name does not tell you what it does, your state management and observability are weaker before you write a single monitoring rule. The name is the first layer of observability.
The three pillars of operational clarity in a multi-agent platform are:
We violated all three in the old fleet. The rebuild addresses all three, starting with naming because it is one of the cheapest fixes with the broadest operational payoff.
Use business names in all machine-readable contexts: code, logs, metrics, configs, database schemas, and API routes. Codenames are fine for conversation and narrative documentation, but they should not be the primary identifier in infrastructure. Keep a single translation file so anyone can look up the mapping without relying on tribal knowledge.
Sequence matters. Start with low-risk surfaces such as documentation, dashboard labels, and alert names, then move inward toward application configs and database objects. Avoid renaming multiple critical identifiers in the same change set. Temporary aliases can help during the transition, but they should be time-boxed and removed once the migration is complete.
Yes. Frameworks generally let you assign your own identifiers to agents, nodes, or workflows. Use the business name as the stable machine identifier and keep the codename in your own metadata layer. The framework usually does not care what string you choose, but your operators and future maintainers will.
Yes, indirectly. Naming does not determine repository strategy by itself, but inconsistent naming makes either strategy harder to operate. In a monorepo, clear names make the directory tree navigable. In a multi-repo setup, clear names make ownership, alerts, and deployment targets easier to understand.
Name the agent after its primary business function and document secondary responsibilities in the README and the naming.yaml manifest. If an agent truly serves two distinct domains equally, that is often a sign it should be split into two agents with a shared library or shared runtime components.
naming.yaml file as the canonical mapping between business names and codenames.Naming conventions seem minor until they are the reason an alert takes ten extra minutes to understand. If you operate multiple agents or services, descriptive business names reduce ambiguity, speed up onboarding, and make incidents easier to manage.
If your platform still relies on lore-heavy codenames in production systems, start by defining one canonical naming rule and one translation file. It is a small architectural decision with outsized operational benefits.
If you are working through similar platform cleanup, explore more of the ESS engineering blog for patterns on monorepos, state management, and agent operations.
Discover more content: