Industry Watch: November 2025 - The Two-Week War

Three frontier models from three labs in twelve days. This is what an arms race looks like.

Executive Summary

November 2025 will be studied in business schools for decades. Between November 12th and November 24th, all three major U.S. AI labs—OpenAI, Google, and Anthropic—released flagship models within days of each other. The competitive pressure was palpable. Each release felt like a direct response to the one before it.

Heres the timeline:

November 12: OpenAI ships GPT-5.1 Instant and Thinking
November 18: Google launches Gemini 3 Pro and previews Deep Think
November 19: OpenAI counters with GPT-5.1-Codex-Max
November 24: Anthropic drops Claude Opus 4.5

For enterprise decision-makers, the message is clear: the model layer is commoditizing faster than anyone predicted.

OpenAI: GPT-5.1 Family (November 12 19)

The November 12 Launch

OpenAI led with GPT-5.1 Instant and GPT-5.1 Thinking—positioning them as smarter, warmer, more conversational successors to GPT-5.

GPT-5.1 Instant is OpenAIs new default model. Key improvements:

Adaptive reasoning: Dynamically adjusts thinking time based on task complexity
88% reduction in tokens generated for simple tasks (massive cost savings)
2-3x faster than GPT-5 while outperforming it on Balyasnys dynamic evaluation suite
8 personality presets: Default, Friendly, Efficient, Professional, Candid, Quirky, and more

GPT-5.1 Thinking is the advanced reasoning model, now faster on simple tasks and more persistent on complex ones. It includes a no reasoning mode for tasks that dont require deep thought.

Developer highlights:

Configurable effort levels for adaptive reasoning
New apply_patch and shell tools in the Responses API
24-hour prompt caching (up from previous limits)

The November 19 Counter-Punch: Codex-Max

One week after the initial launch—and one day after Googles Gemini 3 announcement—OpenAI released GPT-5.1-Codex-Max, an agentic coding model built for long-running tasks.

This is significant: Codex-Max is OpenAIs first model natively trained to operate across multiple context windows through a process called compaction. It coherently works over millions of tokens in a single task.

Per OpenAI: Codex-Max has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging.

For enterprise engineering teams, this changes the calculus on what can be delegated to AI.

Google: Gemini 3 Pro Deep Think (November 18)

The Flagship Launch

Googles timing—six days after OpenAIs GPT-5.1 and one day before Codex-Max—felt deliberate. Gemini 3 Pro launched as their most capable model ever, deployed immediately across Search, the Gemini app, AI Studio, Vertex AI, Gemini CLI, and the Antigravity IDE.

Benchmark dominance:

1501 Elo on LMArena Leaderboard (top score)
37.5% on Humanitys Last Exam (without tools)—PhD-level reasoning
91.9% on GPQA Diamond
23.4% on MathArena Apex (new state-of-the-art)

Deep Think: The Reasoning Breakthrough

Gemini 3 Deep Think mode pushes reasoning even further using advanced parallel reasoning to explore multiple hypotheses simultaneously.

Deep Think results:

41.0% on Humanitys Last Exam (without tools)
93.8% on GPQA Diamond
45.1% on ARC-AGI-2 (with code execution)—demonstrating novel problem-solving

For context: Deep Think builds on variants that achieved gold-medal standard at both the International Mathematical Olympiad and International Collegiate Programming Contest World Finals.

Availability: Deep Think is currently limited to Google AI Ultra subscribers ($250/month), with Google citing additional safety evaluation time before broader rollout.

Anthropic: Claude Opus 4.5 (November 24)

Reclaiming the Coding Crown

Anthropics response came six days after Gemini 3. Claude Opus 4.5 launched with a clear message: were still the best for real-world coding.

Headline numbers:

80.9% on SWE-bench Verified—first model ever to break 80%
66.3% on OSWorld (computer use benchmark)—best in class

The positioning was explicit: this is Anthropics answer to GPT-5.1-Codex-Max and Gemini 3.

Specs and Pricing

200,000 token context
64,000 token max output
March 2025 knowledge cutoff
$5/$25 per million tokens (input/output)

The Ecosystem Play

Alongside Opus 4.5, Anthropic made Claude for Chrome and Claude for Excel more broadly available—signaling a push beyond the API into everyday enterprise tools.

They also announced upgraded plan mode for Claude Code and Claude Code support in the desktop app. The message: were not just a model company, were a productivity platform.

What This Means for Enterprise

The Model Layer is Commoditizing

All three frontier models are now in the same performance tier for most enterprise tasks. The differentiation is increasingly about:

Ecosystem integration (Googles suite, Anthropics Chrome/Excel, OpenAIs Copilot)
Specialized capabilities (Codex-Max for long-running tasks, Deep Think for research)
Pricing and packaging
Safety and alignment guarantees

The New Decision Framework

For enterprise teams evaluating models:

Use Case	Recommendation
General productivity	GPT-5.1 Instant (speed + cost)
Complex reasoning	Gemini 3 Deep Think or GPT-5.1 Thinking
Long-running code tasks	GPT-5.1-Codex-Max
Real-world coding benchmarks	Claude Opus 4.5
Computer use / automation	Claude Opus 4.5
Cost-sensitive workloads	Claude Haiku 4.5 (October release)

The Multi-Model Future

The enterprises winning at AI in 2026 wont be OpenAI shops or Anthropic shops. Theyll be orchestrating multiple models based on task requirements:

Haiku 4.5 for high-volume, cost-sensitive tasks
Opus 4.5 for complex coding and computer use
GPT-5.1-Codex-Max for multi-day autonomous engineering
Gemini 3 Deep Think for research and mathematical reasoning

The orchestration layer—the system that routes tasks to the right model—is becoming the strategic asset.

The Competitive Dynamics

What We Learned

Speed matters: Each lab felt compelled to respond within days. The fear of being last is real.
Coding is the battleground: All three labs led with coding benchmarks. Enterprise engineering teams are the beachhead.
Agentic capabilities are table stakes: Long-context, multi-step reasoning, computer use—these are now expected, not differentiating.
Safety is a competitive dimension: Anthropic continues to emphasize alignment. Google delayed Deep Think for safety testing. This matters to enterprise buyers.

Whats Next

The pace isnt slowing. December is already rumored to bring GPT-5.2 from OpenAI (focused on speed and stability). Google is rolling Deep Think to broader audiences. Anthropic hasnt announced whats next—which usually means something is coming.

For enterprise leaders: Lock in your model evaluation frameworks now. The next wave is already building.

This month marked a turning point. The question is no longer Which model is best? Its Which models—plural—and how do we orchestrate them?

How This Article Was Made

This article is a live example of the AI-enabled content workflow we build for clients.

Stage	Who	What
Research	Claude Opus 4.5	Analyzed current industry data, studies, and expert sources
Curation	Tom Hundley	Directed focus, validated relevance, ensured strategic alignment
Drafting	Claude Opus 4.5	Synthesized research into structured narrative
Fact-Check	Human + AI	All statistics linked to original sources below
Editorial	Tom Hundley	Final review for accuracy, tone, and value

The result: Research-backed content in a fraction of the time, with full transparency and human accountability.

Why We Work This Way

Were an AI enablement company. It would be strange if we didnt use AI to create content. But more importantly, we believe the future of professional content isnt AI vs. Human—its AI amplifying human expertise.

Every article we publish demonstrates the same workflow we help clients implement: AI handles the heavy lifting of research and drafting, humans provide direction, judgment, and accountability.

Want to build this capability for your team? Lets talk about AI enablement →