
๐ค Ghostwritten by Claude Opus 4.6 ยท Fact-checked & edited by GPT 5.4
Jeremy Howard helped popularize one of modern NLP's most important ideas: pretraining a language model on general text and then fine-tuning it for a specific task. His 2018 ULMFiT paper showed that this approach could deliver state-of-the-art text-classification results with much less labeled data than earlier methods. He did not invent GPT or BERT, and it overstates the case to say ULMFiT "directly inspired" both systems. But Howard's work was part of the broader shift that made the pretrain-then-adapt paradigm central to modern AI.
That pattern fits the rest of his career. Howard studied philosophy at the University of Melbourne rather than computer science, became a top Kaggle competitor, co-founded fast.ai with Rachel Thomas, led Enlitic in medical AI, and later launched Answer.AI with Eric Ries. Across those roles, the through-line is consistent: make powerful machine learning techniques usable by more people, with less ceremony and less gatekeeping.
For executives and practitioners, Howard's story matters for a practical reason. It shows how applied research, accessible tooling, and strong teaching can shape an industry almost as much as frontier-model scale.
TL;DR: ULMFiT was a landmark 2018 NLP result that helped validate transfer learning for language tasks, but it should be described as an important precursor to the modern pretraining era rather than the sole foundation of GPT and BERT.
Before 2018, many NLP systems were still trained task by task, often with substantial labeled datasets and custom architectures. ULMFiT โ short for Universal Language Model Fine-tuning for Text Classification โ demonstrated that a language model pretrained on a large general-domain corpus could be adapted to downstream classification tasks with strong results.
The paper reported error-rate reductions of 18% to 24% on six text-classification datasets relative to prior state of the art. That result mattered because it showed transfer learning could work in NLP as effectively as it already had in computer vision.
The historical framing needs precision, though. GPT (the original paper was published in 2018) and BERT (published later in 2018) emerged from a broader research movement around large-scale language-model pretraining. ULMFiT belongs in that story, but it is more accurate to say it helped validate and popularize the pretrain-then-fine-tune pattern than to claim it directly created the architectural paradigm behind both model families.
For decision-makers, the lesson is less about credit assignment and more about strategy. Important shifts in AI often begin as practical workflow changes before they become industry orthodoxy. ULMFiT was one of those shifts: it made adaptation, reuse, and data efficiency feel operationally real.
TL;DR: fast.ai became influential by teaching people to build useful models early, then learn the theory in context, lowering the barrier to entry for applied deep learning.
Howard co-founded fast.ai with Rachel Thomas, and together they built one of the most recognizable educational brands in applied machine learning. The flagship course, Practical Deep Learning for Coders, deliberately inverted the usual academic sequence. Instead of spending weeks on prerequisites before touching a model, students trained models early and then worked backward into the underlying concepts.
That approach shaped both the curriculum and the software. The fastai library, built on top of PyTorch, emphasized high-level abstractions, sensible defaults, and progressive disclosure of complexity. Beginners could get results quickly, while advanced users could still inspect and customize the stack.
This mattered beyond education. It expanded the pool of people who could become productive with deep learning, especially software developers and domain experts who were not coming from traditional ML research backgrounds.
Howard's nontraditional path is part of the appeal, but it is sometimes oversimplified. He studied philosophy at the University of Melbourne, yes, but he also built and led companies before becoming widely known in AI. He was involved in Fastmail early in its history, founded Optimal Decisions Group, became a top-ranked Kaggle competitor, and later served as Kaggle's President and Chief Scientist.
That mix of entrepreneurship, competitive data science, and teaching helps explain why his writing and talks tend to focus on what works in practice.
Howard's public writing is available at jeremy.fast.ai, and his open-source work is published through his GitHub profile at github.com/jph00.
TL;DR: Answer.AI is a useful example of a small AI-focused lab trying to turn modern tooling into outsized output, though claims about exact team size and impact should be framed cautiously.
Howard and Eric Ries launched Answer.AI in late 2023 as a research-and-product lab focused on practical AI systems. The company has been presented publicly as a deliberately lean organization, and that positioning is central to its message: small teams, if organized well and equipped with strong tools, can produce work that once required much larger groups.
That thesis is plausible, and Answer.AI has produced visible output. Publicly associated projects and initiatives include:
| Project | Year | Description |
|---|---|---|
| ModernBERT | 2024 | Encoder model family released with collaborators including LightOn, designed for long context and modernized BERT-style workloads |
| FastHTML | 2024 | Python framework aimed at building web apps with a strong emphasis on simplicity and AI-era developer workflows |
| Solveit | 2025 | Education-focused initiative associated with Howard and Eric Ries |
| Dialogue engineering | Ongoing | Howard's term for a more structured, iterative way of working with AI systems than one-shot prompting |
A few caveats improve the accuracy here. First, "roughly 14 people" appears to come from public descriptions, but private-company headcount can change quickly and is hard to verify precisely. Second, ModernBERT should not be described as solely an Answer.AI product; it involved external collaborators. Third, "vibe coding" is better avoided unless the article is specifically about that trend, because it dates quickly and adds little precision.
The broader point still stands. AI tools can compress the amount of coordination and boilerplate required to ship software, research artifacts, and educational products. That does not mean every small team will outperform a large one. It does mean the historical relationship between headcount and output is less stable than it used to be.
TL;DR: Howard's influence extends beyond research papers through Kaggle, Enlitic, public education, and issue advocacy, though some "first" claims need careful wording.
Howard was one of Kaggle's best-known competitors before joining the company in leadership. Referring to him as the platform's "globally #1-ranked competitor" is directionally consistent with how he was widely described, though rankings change over time and should not be treated as a permanent title.
He also founded Enlitic, an early company applying deep learning to medical problems such as radiology. Calling it "the first company to apply deep learning to medicine" is too strong and difficult to verify. A more defensible description is that Enlitic was among the early, prominent startups pushing deep learning into medical imaging and clinical workflows.
Howard's public profile also grew through talks and writing. His TED Talk, The wonderful and terrifying implications of computers that can learn, helped introduce a broader audience to machine learning's promise and risks. In 2020, he became a visible advocate in the #Masks4All movement, using data analysis and public communication to argue for mask adoption during the COVID-19 pandemic.
Taken together, these chapters reinforce a consistent theme: Howard tends to favor practical deployment, public explanation, and accessible tools over prestige signaling.
ULMFiT is a 2018 method for fine-tuning a pretrained language model for text classification. Its significance is that it showed transfer learning could work well in NLP, reducing the amount of labeled data needed for strong downstream performance.
Not exactly. It is more accurate to say Howard helped validate the broader pretraining-and-fine-tuning pattern that became central to modern NLP. GPT and BERT came out of a wider research wave, with different architectures and training objectives.
fast.ai emphasizes a top-down, build-first approach. Students start by training useful models, then learn the theory in context. That makes the material more accessible to software developers, analysts, and domain specialists who need practical capability quickly.
No. He studied philosophy at the University of Melbourne. His career is often cited as evidence that strong applied AI work does not require a conventional academic path, although it still requires substantial technical depth and sustained practice.
He uses the term to describe working with AI systems through structured, iterative interaction rather than treating prompting as a one-shot input-output task. The idea is less about a single clever prompt and more about designing a repeatable conversational workflow.
Jeremy Howard's place in AI history is best understood not as the sole inventor behind today's language-model boom, but as one of the field's most effective translators and accelerators. He helped prove that transfer learning in NLP could work, helped teach a generation of practitioners how to use deep learning productively, and continues to argue that small teams can do consequential work when the tooling is right. In 2026, that combination of technical credibility and practical accessibility remains unusually relevant.
Discover more content: