Andrej Karpathy LLM Wiki and Idea Files
TECH

Andrej Karpathy LLM Wiki and Idea Files

32+
Signals

Strategic Overview

  • 01.
    Andrej Karpathy published a viral workflow (12M views) for using LLMs to build and maintain personal knowledge wikis in Markdown, bypassing traditional RAG and vector databases entirely.
  • 02.
    The system uses a three-layer architecture — Raw Sources, Wiki, and Schema — where the LLM acts as a 'compiler' that reads raw documents and produces structured, interlinked wiki articles with backlinks and cross-references.
  • 03.
    Karpathy also introduced 'idea files' — conceptual blueprints designed for LLM agents to build customized applications from scratch, representing a potential new paradigm for software distribution.
  • 04.
    His research wiki grew to approximately 100 articles and 400,000 words on a single topic, with a single ingested source touching 10-15 wiki pages during compilation.

From RAG to Compilation: Why Karpathy Thinks Vector Databases Are Overkill

The dominant paradigm for giving LLMs access to personal or enterprise knowledge has been Retrieval-Augmented Generation — embedding documents into vector databases and retrieving relevant chunks at query time. Karpathy's LLM wiki takes a fundamentally different approach. Instead of searching for relevant fragments on the fly, the LLM reads raw source documents upfront and 'compiles' them into structured, interlinked Markdown articles. The wiki becomes a persistent, compounding artifact rather than a transient retrieval result.

The argument is partly about scale. At approximately 100 articles and 400,000 words, a personal research wiki fits comfortably within modern LLM context windows. Vector databases introduce retrieval noise and infrastructure overhead that, Karpathy argues, exceed their value at this scale. The compilation approach also produces something RAG cannot: a navigable, human-readable knowledge base with cross-references, backlinks, and synthesis across sources. A single ingested document may touch 10-15 wiki pages during compilation, creating connections that chunk-based retrieval would miss entirely. This challenges the industry's heavy investment in vector database infrastructure for personal knowledge management, though it remains an open question whether the approach scales to enterprise-sized corpora.

The Idea File: Sharing Blueprints Instead of Code

Perhaps the most forward-looking element of Karpathy's gist is the 'idea file' concept. In traditional software distribution, developers share specific implementations — packages, containers, compiled binaries. Karpathy proposes that in the LLM agent era, you share the idea instead: a high-level conceptual blueprint that describes what an application should do, and let each user's LLM agent build a customized implementation from scratch.

This is not merely a philosophical musing. The LLM wiki gist itself is published as an idea file — it describes the architecture, the three-layer structure, the operational cycle, and the design principles, but it is not a repository you clone and run. The document's job, as one analysis put it, is to 'communicate the pattern.' Harrison Chase, founder of LangChain, immediately questioned whether idea files are simply product requirements documents (PRDs) by another name. The distinction may be that PRDs are written for human engineering teams, while idea files are explicitly designed for LLM interpretation — optimized for machine comprehension with the expectation that the agent will make implementation decisions autonomously. Jack Dorsey endorsed the concept with a simple 'great idea file,' signaling interest from the broader tech community. If the pattern takes hold, it could shift open-source culture from 'open code' to 'open ideas.'

The Contamination Problem: When Your Wiki Writes Itself

Steph Ango, CEO of Obsidian, raised a concern that cuts to the heart of LLM-maintained knowledge: contamination. If an LLM writes and maintains your entire wiki, how do you distinguish your own thinking from the machine's synthesis? Ango recommended vault separation — keeping AI-generated wikis in a separate Obsidian vault from human-curated personal notes — to maintain a clear trust boundary.

This tension is not trivial. Karpathy's system explicitly states that the user 'never (or rarely) writes the wiki directly.' The LLM handles compilation, cross-referencing, health checks, and even gap-filling by searching the web for missing data. The wiki becomes a mirror of the LLM's interpretation of your sources, not necessarily of your own understanding. For research purposes, this may be perfectly acceptable — the wiki is a reference tool, not a journal. But as these systems become more sophisticated and the line between 'my knowledge' and 'my LLM's knowledge' blurs, the epistemological questions multiply. The 4-phase cycle — Ingest, Compile, Query/Enhance, Lint/Maintain — means the wiki is constantly being rewritten by the machine, raising questions about versioning, attribution, and intellectual ownership that the current architecture does not fully address.

Why Wikis Die and Why LLMs Might Save Them

Personal wikis have a well-documented failure mode: they die from maintenance burden. The initial enthusiasm of building a knowledge base gives way to the grinding work of keeping cross-references current, updating summaries when new information arrives, and filling gaps. Most personal wikis become digital graveyards within months. Karpathy's insight is that LLMs never tire of this work.

The system's maintenance cycle — lint checks for inconsistencies, automated imputation of missing data via web searches, discovery of connections for new article candidates — addresses exactly the chores that kill human-maintained wikis. Two critical files manage navigation: index.md serves as a content-oriented catalog with summaries, while log.md acts as an append-only chronological record. The LLM auto-maintains both. Karpathy noted that he 'thought he had to reach for fancy RAG' but found that the LLM was effective at maintaining index files and brief summaries on its own. Elvis Saravia at DAIR.AI independently validated this, reporting that his automated research curation system became 'remarkably good' at the same maintenance tasks. The pattern suggests that LLMs may be uniquely suited to the thankless middle-ground work of knowledge management — not the initial capture or the final insight, but the ongoing curation that makes knowledge findable and trustworthy.

From Memex to LLM Wiki: Eighty Years of Personal Knowledge Infrastructure

In 1945, Vannevar Bush imagined the Memex — a device that would let individuals store, link, and retrieve their accumulated knowledge through associative trails. Every personal knowledge management tool since, from hypertext to wikis to Notion to Obsidian, has been an attempt to realize some version of that vision. Karpathy's LLM wiki may be the closest implementation yet, not because of the storage mechanism, but because the LLM provides the missing piece: an agent that can create and maintain the associative links automatically.

What makes this moment different from previous iterations is the compiler metaphor. Previous tools gave humans better interfaces for manually organizing knowledge. Karpathy's system delegates the organization itself to the machine. The human's role shifts from curator to commissioner — choosing what raw sources to ingest and what questions to ask, while the LLM handles the structural work of synthesis, linking, and maintenance. Glen Rhodes identified this as a shift 'from LLMs as answer machines to LLMs as knowledge infrastructure,' which is not a small change. Karpathy himself acknowledges the system is currently a 'hacky collection of scripts' and sees room for a dedicated product. Given his track record of naming paradigms that stick — 'vibe coding' entered the lexicon almost overnight — the LLM wiki concept may rapidly evolve from a personal workflow into a product category.

Historical Context

1945-01-01
Proposed the 'Memex' concept, a theoretical device for storing and linking personal knowledge, which Karpathy's LLM wiki is now seen as a modern realization of.
2025-01-01
Coined 'vibe coding' and published the '2025 LLM Year in Review,' laying the intellectual groundwork for the knowledge compilation paradigm.
2026-04-02
Posted original 'LLM Knowledge Bases' tweet on X, which went viral and reached 12M views, sparking widespread discussion about LLM-maintained wikis.
2026-04-04
Published the full 'LLM Wiki' GitHub gist detailing the three-layer architecture and introducing the 'idea file' concept for software distribution.

Power Map

Key Players
Subject

Andrej Karpathy LLM Wiki and Idea Files

AN

Andrej Karpathy

Originator of both the LLM Wiki workflow and Idea Files concept. Co-founder of OpenAI and former Director of AI at Tesla, whose influence causes rapid community adoption of his proposed paradigms.

OB

Obsidian / Steph Ango

CEO of Obsidian, the recommended viewer in Karpathy's workflow. Recommended vault separation to prevent AI-generated content from contaminating human-curated knowledge bases. Obsidian has 1.5M+ users.

VE

Vector database and RAG industry

Potentially disrupted stakeholder. Karpathy's approach challenges the necessity of vector databases and complex RAG pipelines for personal-scale knowledge management.

EL

Elvis Saravia / DAIR.AI

AI researcher who validated the LLM wiki pattern independently, implementing a parallel system for automated research paper curation.

THE SIGNAL.

Analysts

"Believes LLMs should shift from generating code to maintaining knowledge. Argues that knowledge compilation is more practical than RAG for personal-scale bases. Sees room for a dedicated product beyond his current 'hacky collection of scripts.'"

Andrej Karpathy
Co-founder of OpenAI, former Director of AI at Tesla

"Confirmed that the LLM knowledge base pattern works for AI research curation, noting that what started as manual review is now fully automated and 'remarkably good at capturing the best of the best.'"

Elvis Saravia
Founder, DAIR.AI

"Recommended vault separation to prevent AI-generated content from contaminating human-curated personal knowledge vaults, highlighting the trust boundary between human and machine-written knowledge."

Steph Ango
CEO, Obsidian

"Sees the LLM wiki as a fundamental paradigm shift, arguing that 'the real shift here is from LLMs as answer machines to LLMs as knowledge infrastructure.'"

Glen Rhodes
Tech blogger and analyst
The Crowd

"LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest."

@@karpathy43000

"Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an idea file. The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea."

@@karpathy13000

"great idea file"

@@jack2000
Broadcast