
Software development is splitting into three distinct paradigms โ and the newest one barely involves writing code. That's the core argument Andrej Karpathy, former Tesla AI director and an OpenAI founding member, made in his June 17, 2025 YC AI Startup School keynote, billed as Software Is Changing (Again). The framework: Software 1.0 is classical code written by humans, Software 2.0 is neural networks trained on data, and Software 3.0 is behavior shaped through natural-language prompts to large language models. For executives, this isn't taxonomy โ it's a working roadmap for how teams, products, and competitive positioning will reshape over the next three to five years.
Karpathy's standing here is hard to overstate. He led Tesla's Autopilot vision team, helped shape OpenAI's early research culture, taught one of the most widely watched deep-learning courses at Stanford, and now runs Eureka Labs, the AI-native education company he founded in mid-2024 around the LLM101n course.
TL;DR: In Software 3.0 the "program" is a prompt โ natural-language instructions that shape LLM behavior โ replacing traditional code for a growing class of use cases.
Karpathy's three-layer model gives executives a clean mental scaffold for where AI fits in the stack:
| Paradigm | How You Program | Who Programs | Example |
|---|---|---|---|
| Software 1.0 | Write explicit code (Python, Java, C++) | Software engineers | Business logic, APIs, databases |
| Software 2.0 | Train neural networks with data | ML engineers | Image recognition, recommendation engines |
| Software 3.0 | Write prompts in natural language | Anyone with domain expertise | Content generation, analysis, customer interaction |
The crucial insight isn't that 3.0 replaces the others โ all three coexist, and knowing which paradigm fits which problem is itself a strategic competency. Payments still need 1.0 determinism. Computer vision still needs 2.0 training. But customer-support triage, contract review, or personalized onboarding? Increasingly 3.0 territory.
If the program is a prompt, the differentiating skill isn't syntax โ it's domain expertise and clear communication. The people best positioned to build 3.0 applications aren't necessarily senior engineers; they may be product managers, operations leads, or subject-matter experts. Engineers don't disappear โ someone still has to build the substrate and own the 1.0 and 2.0 layers โ but the contribution surface for non-engineers expands meaningfully.
Worth noting the governance contrast: Karpathy theorizes collaborative human-AI engineering as a clean abstraction, while The New Yorker's April 7, 2026 Sam Altman exposรฉ portrays erosion of internal accountability at OpenAI itself. The framework is timely; the institutions building 3.0's substrate are still catching up.
TL;DR: LLMs don't fail gracefully or predictably โ they exhibit "jagged intelligence," excelling at some tasks and failing unexpectedly at others. Working with them requires what Karpathy calls "LLM psychology."
The most practically useful concept in the talk is jagged intelligence โ LLMs can draft a sophisticated legal brief in seconds, then fumble basic arithmetic, or implement a complex algorithm flawlessly while hallucinating a library that doesn't exist. The competence boundary isn't smooth; it's jagged and context-dependent.
Karpathy pairs this with LLM psychology โ the discipline of understanding how these models actually behave rather than how we assume they should. Traditional debugging follows deterministic logic: wrong output, trace the path. LLM debugging is closer to managing a brilliant but inconsistent collaborator. You need to understand:
The 2023 paper Navigating the Jagged Technological Frontier, by Dell'Acqua and co-authors at Harvard Business School, Wharton, MIT, and Warwick โ run as a field experiment with Boston Consulting Group consultants โ captured this dynamic empirically. Across the experiment, consultants using a then-current GPT-4 on tasks inside the model's frontier produced work rated meaningfully higher in quality than the control group. On tasks deliberately chosen to fall outside the frontier, GPT-4 users were materially less likely to reach correct answers. Mapping that frontier for your specific use cases is the difference between AI that compounds value and AI that compounds risk.
TL;DR: Karpathy followed the talk by sketching AI-built personal knowledge bases โ a concept that circulated widely on X and reads as a textbook 3.0 application.
Karpathy didn't only theorize. Not long after the YC keynote he posted a GitHub Gist sketching personal knowledge bases built and maintained by AI agents, which circulated widely on X across the developer and AI-research community.
The idea is a textbook 3.0 application: instead of a traditional tool with schemas, CRUD, and search indexes, an LLM continuously processes and surfaces personal information through natural-language interaction. The "code" is the prompt architecture and the data flow.
The broader takeaway: what gains traction now isn't incremental improvement, it's reconception of what software is.
TL;DR: The model is widely respected but not unchallenged. Determinism, reliability, and regulation are areas where 1.0 and 2.0 aren't going anywhere.
The determinism problem. Regulated industries โ healthcare, finance, aerospace โ often require deterministic, auditable systems. 3.0 is probabilistic. A prompt that works 97% of the time still fails 3% of the time, and in some domains that's unacceptable. Karpathy implicitly concedes this by framing the paradigms as coexisting; the practical work is in deciding which paradigm goes where.
The talent question. Traditional ML engineering remains essential โ fine-tuning, evaluation, infrastructure, the entire 2.0 layer. But the center of gravity for application-layer roles is shifting. Companies hiring heavily into traditional ML without a 3.0 strategy may find themselves overbuilt for yesterday's paradigm.
The moat question. If anyone can prompt an LLM, where's the advantage? Karpathy points the moat in three directions: proprietary data, domain-specific prompt architectures, and the integration substrate connecting 3.0 components to existing 1.0 and 2.0 systems. The prompt is easy to copy. The system around it is not.
A development paradigm in which applications are built through natural-language prompts to LLMs rather than through traditional code (1.0) or trained neural networks (2.0). The program becomes a prompt, which makes domain expertise as load-bearing as engineering skill.
The uneven capability boundary of LLMs โ strong on some hard tasks, surprisingly weak on some easy ones. Failures aren't graceful; they require structured prompting, validation, and human checkpoints.
No. He frames the three paradigms as coexisting. Traditional code remains essential for deterministic systems, infrastructure, integrations, and regulated applications. The argument is that a growing class of applications โ those involving language, analysis, and unstructured data โ is better served by 3.0.
Karpathy's framework isn't a forecast โ it describes what's already happening at leading tech companies. The question for mid-market executives isn't whether the shift is real, but how quickly it reaches your industry. And the cadence is unforgiving: roughly two weeks after this analysis posts, OpenAI ships GPT-5.5 ("Spud") on April 23, 2026 โ a concrete capability jump that raises the ceiling on what 3.0 systems can do. The abstraction was timely; the substrate is still moving.
The practical first step is an honest assessment: which products, workflows, and internal tools are candidates for 3.0 approaches? Where do you actually have domain expertise to build prompt-based applications? And where do you still need 1.0 reliability?
Discover more content: