Context Engineering and Agent Skills
TECH

Context Engineering and Agent Skills

37+
Signals

Strategic Overview

  • 01.
    Two ideas that looked unrelated in 2025 have fused into a single AI stack paradigm in 2026: context engineering, the discipline of curating exactly which tokens reach the model, and Agent Skills, the markdown-file packaging format that lets any agent load that discipline on demand.
  • 02.
    Anthropic formalized Agent Skills as an open standard in December 2025, defining a skill as nothing more than a folder containing a SKILL.md file with YAML frontmatter, plus optional scripts and templates that Claude loads only when relevant.
  • 03.
    By March 2026, 32 competing coding tools, including Microsoft VS Code, OpenAI Codex CLI, Cursor, Gemini CLI, JetBrains Junie, AWS Kiro and Block Goose, all read the same SKILL.md format from the same directory layout, making it the fastest cross-vendor standardization event the AI tooling space has seen.
  • 04.
    Hallmark, launched by Hassan El Mghari in May 2026, is the highest-profile demonstration that aesthetic discipline can be packaged as a skill: a single SKILL.md that runs 57 slop-test gates and forces Claude Code, Cursor and Codex to stop generating UIs that look statistically average.
  • 05.
    The discipline underneath both moves is the primary versus secondary source distinction, where raw data, transcripts and code are treated as ground truth and summaries or documentation are recognized as one step removed and noisier.

From Prompt Engineering to Context Engineering: The Vocabulary Shift That Actually Means Something

The renaming of the discipline from prompt engineering to context engineering was not a marketing tweak, it was a tacit admission that the bottleneck in production LLM systems is no longer how you phrase one instruction but which tokens you let into the window in the first place. Anthropic's own engineering post is unusually blunt about this: good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome [2]. Andrej Karpathy and Tobi Lutke both reached the same conclusion independently in June 2025, and Simon Willison's consolidation piece is where the term locked in for the broader developer community [5].

The deeper reason it stuck is that the practice it names is genuinely different. Prompt engineering treats the model like a function and asks how to word the argument. Context engineering treats the agent as a system that runs across many turns and asks which artifacts, tools, retrieved documents, prior tool outputs and reasoning traces should occupy a finite window at each step. The LangChain ecosystem has since anchored on a working frame of four canonical strategies — Write, Select, Compress and Isolate — which has become the community's de-facto teaching vocabulary. Practitioners on r/PromptEngineering have started formalizing the same work into a 5-stage pipeline (Curate / Compress / Structure / Deliver / Refresh) plus a tiered memory budget split roughly into working context (60-70%), recent compressed history (20-30%) and always-true facts (10-15%). Failure modes have names now too, including 'context rot,' the gradual degradation of agent quality as stale or low-signal tokens accumulate across turns.

The primary versus secondary source metaphor that seeded this cluster makes the heuristic operational: raw data, transcripts and code are primary, summaries and documentation are secondary, and a well-engineered context preserves the primary material when fidelity matters. Anthropic extends the same discipline to multi-agent systems, arguing that sub-agents should communicate through artifacts rather than raw traces, so a web-search agent surfaces only material the downstream agent can actually use rather than dumping its full browsing history [2].

How a 600-Word Markdown File Became the Year's Most Adopted AI Standard

Agent Skills won on the merits of being almost embarrassingly small. The entire specification reduces to a folder containing a SKILL.md file with YAML frontmatter declaring at minimum a name and a description, plus optional scripts, references and templates that Claude only descends into when relevant [1]. That smallness is the feature: it enables progressive disclosure, the design principle Anthropic identifies as the reason skills stay flexible and scalable, because Claude reads the lightweight metadata first and only loads the body of the skill when the model decides the skill is needed [1]. Context windows stay lean by default and expand only on demand.

The adoption curve is the part competitors will be studying for years. Within 48 hours of the December 2025 release, Microsoft had wired SKILL.md into VS Code and OpenAI had shipped support in ChatGPT and Codex CLI [6][7]. By March 2026, 32 cross-vendor tools, including Google's Gemini CLI, JetBrains Junie, AWS Kiro and Block Goose, were all reading the same file format from the same directory layout, an outcome the Paperclipped analysis calls the fastest cross-vendor standardization event in AI tooling [7]. The strategic playbook is identical to Anthropic's earlier donation of the Model Context Protocol to the Linux Foundation: release a tiny, well-scoped spec, open-source the SDK, let competitors adopt it before they build a rival, and end up owning the infrastructure layer of the agent stack by default [6].

The other reason adoption stuck this fast is that the ROI got concrete almost immediately. One widely-shared developer benchmark reported that wiring an Insforge Skills plus CLI context layer into Claude Code reduced a workload from 10.4 million tokens to 3.7 million tokens, cut spend from $9.21 to $2.81, and eliminated 10 errors entirely. Numbers that specific — a roughly 3x reduction in tokens, a 70% drop in cost, and a clean zero on the error count — are the kind of result that turns a markdown spec from hype into a procurement line item, and builders have been circulating similar context-engineering repos as Meta-Agent knowledge bases for the same reason.

Hallmark and the Codification of Taste: Why UIs Built by Agents All Look the Same

Hallmark is the cleanest answer yet to a problem the field has been quietly avoiding: agentic code tools, left to their own statistical defaults, generate UIs that look statistically average because the training data is statistically average. Hassan El Mghari's SKILL.md confronts this head-on by picking a macrostructure for the brief from a library of 21 named layouts, dressing it in one of 22 themes across four genres, then running 57 slop-test gates plus a pre-emit self-critique that refuses the on-distribution defaults every LLM was trained into [3][4]. The constraints are unusually specific, with an accent-color budget of under five percent and one anchor hue per page, which is the kind of opinionated rule a senior designer would impose on a junior and which an LLM will never invent on its own [4].

The broader pattern Hallmark surfaces is more important than the specific design rules. Taste, design discipline, brand voice, accessibility standards, code review conventions, anything that previously lived as tribal knowledge inside a senior practitioner, can now be packaged as a SKILL.md and distributed via npx skills add [8]. That changes the unit of competition in agent tooling. It is no longer who has the best base model, but who can encode the most opinionated and well-tested discipline into a portable skill that runs on every vendor's agent. The mer.vin walkthrough is explicit about the framing: Hallmark is the first public proof that aesthetic judgment can be encoded as procedural knowledge and shipped through the same channel as a Python package [8].

The Two Things the Standard Has Not Solved: Cross-Tool Portability and Malicious Skills

The interoperability story is real but uneven. Even with 32 vendors reading the same file format, cross-tool portability is the community's main open complaint: the SKILL.md may parse the same everywhere, but the surrounding affordances — memory layers, tool registries and execution sandboxes — are still vendor-specific, so a skill that works flawlessly in Claude Code may invoke unpredictably in Codex or Cursor. The r/ContextEngineering survey result and the cross-vendor friction discussion on r/opencode line up on the same point: sessions die at tool boundaries and most serious builders have a hand-rolled workaround. One practical finding circulating among community builders on r/opencode is that MANDATORY-style imperatives inside SKILL.md (and CLAUDE.md) work dramatically better than polite suggestions; whatever the underlying reason, the prescription is to write skill instructions as hard requirements rather than recommendations.

The second unresolved issue is governance. The Snyk ToxicSkills study cited in the Paperclipped analysis found that 36 percent of analyzed skills contained security flaws and 76 skills had confirmed malicious payloads, a non-trivial baseline rate for an ecosystem that ships skills as executable folders and is now being adopted into enterprise tools [7]. The same property that made SKILL.md spread quickly, a folder anyone can publish and an SDK that registers it automatically, is the property that makes a malicious-skills supply chain plausible. Procurement teams that have spent two years getting comfortable with MCP server review now have a second, faster-growing attack surface to govern, and the open question is whether the standard will evolve a signing or attestation layer before that supply-chain problem hardens.

Historical Context

2025-06-19
Karpathy publicly endorses context engineering over prompt engineering on X, giving the term its industry-wide push and seeding the vocabulary shift.
2025-06-27
Willison consolidates the Karpathy and Lutke posts into the canonical context engineering essay that fixes the term's meaning for the developer community.
2025-12-09
Anthropic donates the Model Context Protocol to the Linux Foundation, establishing its pattern of open-sourcing agent infrastructure before launching SKILL.md.
2025-12-18
Anthropic publishes Agent Skills as an open standard with spec and SDK at agentskills.io; Microsoft and OpenAI integrate it within 48 hours.
2026-03
Thirty-two competing tools, including Gemini CLI, JetBrains Junie, AWS Kiro and Block Goose, all read the same SKILL.md format from the same directory layout.
2026-05-19
Hallmark launches as the flagship anti-AI-slop design SKILL.md for Claude Code, Cursor and Codex, encoding 57 slop-test gates and 22 themes into a single shareable skill.

Power Map

Key Players
Subject

Context Engineering and Agent Skills

AN

Anthropic

Authored both the canonical context engineering essay and the Agent Skills specification, published the open standard and SDK at agentskills.io, and is now steering the SKILL.md format as industry infrastructure.

HA

Hassan El Mghari (Nutlope), with Together AI

Built and launched Hallmark on 19 May 2026, the flagship anti-AI-slop design SKILL.md distributed via npx skills add nutlope/hallmark.

MI

Microsoft, OpenAI, Cursor

Early adopters of the Agent Skills standard; Microsoft wired SKILL.md into VS Code and OpenAI shipped it into ChatGPT and Codex CLI within 48 hours of release, while Cursor is among the lead targets for Hallmark-style skills.

GO

Google, JetBrains, AWS, Block

Shipped SKILL.md support in Gemini CLI, Junie, Kiro and Goose respectively, completing the 32-vendor interoperability mesh by March 2026.

CA

Canva, Stripe, Notion, Zapier

Partner-built skills available at launch in Anthropic's skill directory, extending Claude into enterprise workflows without bespoke per-vendor integrations.

SI

Simon Willison

Independent commentator who consolidated the Karpathy and Lutke posts into the canonical context engineering essay and on the Agent Skills release said 'I like it. I think this one may have sticking power.'

Fact Check

8 cited
  1. [1] Equipping agents for the real world with Agent Skills
  2. [2] Effective context engineering for AI agents
  3. [3] Nutlope/hallmark — A design skill that refuses to look AI-generated
  4. [4] Hallmark — A design skill that refuses to look AI-generated
  5. [5] Context engineering
  6. [6] Anthropic Opens Agent Skills Standard, Continuing Its Pattern of Building Industry Infrastructure
  7. [7] Agent Skills: An Open Standard for AI Agent Interoperability
  8. [8] Hallmark Design Skill: Anti AI-Slop UI for Claude Code and Cursor

Source Articles

Top 1

THE SIGNAL.

Analysts

"Argues that context engineering better describes industrial LLM work than prompt engineering, because production systems are mostly about packing the window with the right state, history, tools and retrieved data rather than wording a single instruction. He calls it the delicate art and science of filling the context window with just the right information for the next step."

Andrej Karpathy
Former Director of AI at Tesla; OpenAI co-founder

"Endorsed the term as a more accurate label for what real LLM app builders do, framing it as the art of providing all the context for the task to be plausibly solvable by the LLM."

Tobi Lutke
CEO, Shopify

"Believes context engineering has staying power because the term is inferred-definition-friendly and immediately makes sense to anyone who has shipped a production LLM app."

Simon Willison
Independent developer and author of simonwillison.net

"Defines context engineering as the set of strategies for curating and maintaining the optimal set of tokens during LLM inference, and positions skills as the packaging format for re-usable procedural knowledge that travels with the agent."

Anthropic Engineering team
Authors of the Agent Skills and Context Engineering posts

"Treats the cross-vendor SKILL.md adoption as a once-in-a-cycle interoperability event and warns that the open ecosystem already has a meaningful malicious-skill problem that procurement teams will need to govern."

Paperclipped
Independent technical blog covering AI tooling standards
The Crowd

"I'm excited to share a new repo: Agent Skills for Context Engineering Instead of just offering a library of black-box tools, it acts as a "Meta-Agent" knowledge base. It provides a standard set of skills, written in markdown and code, that you can feed to an agent so it understands how to manage its own cognitive resources."

@@koylanai1461

"Claude Code used 3x fewer tokens with one change: - Before: 10.4M tokens · 10 errors · $9.21 - After: 3.7M tokens · 0 errors · $2.81 I used Insforge Skills + CLI as the backend context engineering layer for Claude Code (open-source and local)."

@@_avichawla932

"The rise of context engineering. "Context engineering" has been an increasingly popular term used to describe a lot of the system building that AI engineers do. But what is it exactly? The definition I like: "Context engineering is building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task.""

@@hwchase17662

"35 skills, 3 MCP servers, persistent memory. I built the AI engineering stack I always wanted"

@u/referentuser74
Broadcast
Context Engineering for Agents

Context Engineering for Agents

Context Engineering in 29 Minutes: Complete Course

Context Engineering in 29 Minutes: Complete Course

3 New Context Engineering Skills for Agents

3 New Context Engineering Skills for Agents