TECH

AI memory systems for agents

42+

Signals

Strategic Overview

01.
Slack Engineering published an architecture for long-running multi-agent security investigations that abandons forwarding message history entirely, relying instead on three structured context channels (Director's Journal, Critic's Review, Critic's Timeline) that anchor every agent invocation to a shared, summarized state.
02.
A wave of late-2025 and early-2026 surveys (arXiv:2512.13564, 2601.09113, 2603.07670, 2512.23343) reframes agent memory through cognitive-neuroscience taxonomies, distinguishing parametric, retrieval-based, and agentic (persistent, temporally extended) memory and modeling memory as a write-manage-read loop coupled to perception and action.
03.
Production memory platforms now report concrete operational gains: Mem0's selective pipeline shows 91% lower p95 latency (1.44s vs 17.12s) and over 90% token-cost savings versus full-context, while a separate Hippocampus module claims up to 31x lower retrieval latency and 14x lower per-query token footprint on LoCoMo and LongMemEval benchmarks.
04.
The vendor and framework layer is consolidating: Mem0 spans 21 official integrations across Python and TypeScript, Microsoft merged Semantic Kernel and AutoGen into the unified Microsoft Agent Framework, Google open-sourced an Always-On Memory Agent on the Agent Development Kit, and Letta promotes OS-inspired active context control.

The Slack Pivot: Why the Best Multi-Agent Systems Stopped Forwarding Their Own History

The most consequential pattern emerging from production multi-agent work in 2026 is also the most counterintuitive: the highest-performing long-run agent systems no longer pass message history forward at all. Slack Engineering's April 2026 disclosure of its multi-agent security investigation platform makes this explicit - investigations span hundreds of inference requests and generate megabytes of output, and the team found that naive history-forwarding made later agents incoherent rather than better-informed. Their answer is three specialized context channels: a Director's Journal that supports six entry types (decisions, observations, findings, questions, actions, hypotheses), a Critic's Review, and a Critic's Timeline. Every agent invocation reads from these structured channels; none reads raw transcripts.

The deeper claim is architectural. The Director's Journal isn't a log - it is the orchestrator's reasoning rendered as structured working memory, with phase, round, timestamp, priority, and citations attached to every entry. Slack's own framing is that 'the Journal allows the Director to lead the investigation towards a conclusion, to observe and measure its progress.' The Critic, meanwhile, processed 170,000 reviewed findings broken down as 37.7% Trustworthy, 25.4% Highly-plausible, 11.1% Plausible, 10.4% Speculative, and 15.4% Misguided - a quantified epistemic filter that prevents downstream agents from compounding low-quality intermediate beliefs. The lesson many teams are now drawing: multi-agent coherence is a context-engineering problem, not a model-capability problem, and the unit of design is the channel, not the prompt.

From 'Just Use a Vector DB' to a Five-Family Mechanism Stack

A second shift is conceptual. Three independent surveys published between December 2025 and February 2026 - 'Memory in the Age of AI Agents' (arXiv:2512.13564), 'The AI Hippocampus' (arXiv:2601.09113), and 'Memory for Autonomous LLM Agents' (arXiv:2603.07670) - converge on a much richer ontology than the vector-database-as-memory framing that dominated 2023-2024. The first proposes a three-pronged taxonomy by Forms (token-level, parametric, latent), Functions (factual, experiential, working) and Dynamics (formation, evolution, retrieval). The second sorts memory into implicit (parametric weights), explicit (external retrieval), and agentic (persistent, temporally extended) paradigms. The third formalizes agent memory as a write-manage-read loop and names five concrete mechanism families: context-resident compression, retrieval-augmented stores, reflective self-improvement, hierarchical virtual context, and policy-learned management.

This matters because each family carries a different cost model and failure mode. Context-resident compression saves tokens but loses fine detail; retrieval-augmented stores scale storage but require careful write-gating; reflective self-improvement creates emergent skills but can drift; hierarchical virtual context borrows OS paging metaphors (the explicit pitch behind Letta) but adds scheduling complexity; policy-learned management promises adaptivity but introduces a new training surface. The bymar.co engineering author's blunt summary - 'Just use a vector DB is no longer persuasive' - reflects an industry consensus that production memory now needs to combine several of these families in a single coordinated system, with explicit decisions about what gets written, where it lives, and how it is retrieved.

The Numbers Behind the Hype: Latency, Tokens, and the Selective-Pipeline Win

The economic case for structured memory is now backed by hard production numbers, and they are large enough to reshape build-vs-buy decisions. Mem0's selective pipeline reports a 91% reduction in p95 latency (1.44 seconds versus 17.12 seconds for full-context) and over 90% token-cost savings, while delivering a 26% relative improvement on the LLM-as-a-Judge metric over OpenAI's memory baseline on the LOCOMO benchmark. The separately developed Hippocampus module - a Dynamic Wavelet Matrix that co-indexes binary signatures and token-ID streams - claims up to 31x lower end-to-end retrieval latency and up to 14x lower per-query token footprint on LoCoMo and LongMemEval while maintaining accuracy.

Three implications follow. First, full-context approaches are now demonstrably uneconomic for any agent that exceeds a single session - order-of-magnitude latency and cost gaps are not survivable in production. Second, benchmarks have caught up: LOCOMO and LongMemEval are now the de facto evaluation surfaces, which is exactly what the Mem0 team means by calling 2026 the year memory became 'a production engineering discipline with real benchmarks, measurable trade-offs, and a growing body of operational knowledge.' Third, the integration layer is the new battleground - Mem0's 21 official Python and TypeScript framework integrations, Google's MIT-licensed Always-On Memory Agent, and Microsoft's Semantic Kernel/AutoGen merger into the Microsoft Agent Framework all suggest that distribution into existing agent stacks, not raw retrieval quality, is what determines who wins the platform race.

The Contrarian View: 'Please Stop Building Memory Frameworks'

Not everyone is bullish. The most-upvoted Reddit thread in this cycle - a 254-upvote r/ClaudeCode post titled 'Please stop creating memory for your agent frameworks' - argues that the proliferation of memory libraries is itself the problem. The author's claim is that environments like Claude Code already expose adequate memory primitives (CLAUDE.md, SKILL.md, tasks, plans, auto-memory), and that bolting on additional 'memory frameworks' bloats context, can triple token usage, and increases hallucination rates. Adjacent r/AI_Agents discussion echoes the frustration from the user side: dumping history burns tokens, summarization loses signal, and vector databases feel clunky in practice.

This tension cuts to the heart of the 2026 design debate. r/AIMemory's PenfieldLabs post argues the opposite - current ChatGPT/Claude/Gemini memories are flat blobs without typed relationships or knowledge graphs, and the future requires hybrid retrieval (BM25 + vector + graph expansion via reciprocal rank fusion), agent-managed write-gating, and structured cards that store the 'why' rather than the 'what.' Both camps actually agree on the failure mode: undisciplined writes and undifferentiated retrieval. They disagree on whether the fix is fewer frameworks or better-typed ones. Developer-education channels reflect the same split: Richmond Alake's 'Architecting Agent Memory' talk for MongoDB frames memory as the strategic 'key pillar' of agentic systems, while Google Cloud Tech's ADK videos walk through a deliberately minimal sessions/events/state primitive set. The Slack architecture is, in effect, an existence proof for the typed-channels camp - typed entries with strict schemas - while the contrarian camp warns that most teams will reach for a heavyweight framework before they have earned the complexity.

Historical Context

2023-01-01

AI agent memory was barely recognized as a distinct engineering discipline; most production systems relied solely on the LLM context window.

2025-04-28

Mem0 paper introduced a production-oriented memory architecture with a graph-augmented variant, reporting 91% lower p95 latency versus full-context and a 26% relative LLM-as-a-Judge improvement over OpenAI's memory baseline on LOCOMO.

2025-08-01

Cognitive-science-inspired self-organizing agent memory using Event Segmentation Theory to autonomously segment conversations into coherent episodes.

2025-10-01

Merged Semantic Kernel and AutoGen into the unified Microsoft Agent Framework, signaling enterprise consolidation of agent and memory tooling.

2025-12-15

Multi-author survey introducing a forms/functions/dynamics taxonomy that explicitly distinguishes parametric, retrieval, and agent memory.

2026-01-01

Survey organizing LLM and MLLM memory into implicit (parametric), explicit (external retrieval), and agentic (persistent, temporally extended) paradigms, framing the field around human-memory analogies.

2026-02-01

HPE/academic team proposed a Dynamic Wavelet Matrix that compresses and co-indexes binary signatures and token-ID streams, claiming up to 31x lower retrieval latency and 14x token reduction on LoCoMo and LongMemEval.

2026-04-13

Published 'Managing context in long-run agentic applications' detailing the Director/Experts/Critic system, three-channel context design, and Critic stats from 170,000 reviewed findings.

Power Map

Key Players

Subject

AI memory systems for agents

Slack (Salesforce) Engineering

Operator of a production multi-agent security investigation system; published the Director/Experts/Critic architecture and three structured context channels in April 2026, shaping how teams design context for long-running multi-agent systems.

Mem0

Memory-platform vendor shipping a selective-pipeline architecture and graph-augmented variant (Mem0g); used as a benchmark reference for production-scale agent memory and documents 21 framework integrations.

Letta

OS-inspired agent memory provider giving agents active control of working versus long-term context, cited as architecturally distinctive among 2026 frameworks.

Google

Open-sourced an Always-On Memory Agent (MIT-licensed) on the Agent Development Kit, storing structured memories in SQLite with scheduled consolidation.

Microsoft

Merged Semantic Kernel and AutoGen into the unified Microsoft Agent Framework (Oct 2025) and deepened Azure AI Foundry integration for enterprise RAG and memory pipelines in Q1 2026.

Academic survey consortium (HIT, Fudan, Peking, NUS)

Authors of unifying surveys ('Memory in the Age of AI Agents'; 'AI Meets Brain') reframing agent memory taxonomies and connecting cognitive neuroscience to agent design.

Source Articles

Top 5

THE SIGNAL.

Analysts

"Coherence in multi-agent investigations cannot be achieved by carrying message history forward; it requires deliberately designed context channels that anchor every agent to a shared, summarized state. As they put it, 'Maintaining alignment and orientation in multi-agent investigations requires deliberate design.'"

Slack Engineering team

Authors, Slack Engineering blog (April 13, 2026)

"Structured persistent memory, not bigger context windows, is the practical path to long-term conversational coherence in deployed agents - the paper frames the problem around 'structured, persistent memory mechanisms for long-term conversational coherence.'"

Mem0 research team (Chhikara, Khant, Aryan, Singh, Yadav)

Authors, 'Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory' (arXiv:2504.19413)

"Agent memory is what differentiates a stateless generator from an adaptive agent and should be modeled as a write-manage-read loop coupled to perception and action. Memory, Du argues, 'is what turns a stateless text generator into a genuinely adaptive agent.'"

Pengfei Du

Author, 'Memory for Autonomous LLM Agents' survey (arXiv:2603.07670)

"Memory should no longer be treated as a passive log; it has 'evolved into a dynamic cognitive hub that underpins complex decision-making,' best understood through cognitive-neuroscience-grounded taxonomies that map onto agent design."

Liang et al.

Authors, 'AI Meets Brain' survey, Harbin Institute of Technology, Fudan, Peking, NUS (arXiv:2512.23343)

"Agent memory is now a real systems layer with multiple coordinated components; vector databases alone are no longer sufficient to design around - as they bluntly put it, 'Just use a vector DB is no longer persuasive.'"

bymar.co engineering author

Author, 'Agent Memory Systems in 2026: What Actually Matters'

The Crowd

"This is how AI Agent Memory works. In general, the memory for an agent is something that we provide via context in the prompt passed to LLM that helps the agent to better plan and react given past interactions or data not immediately available."

@@Aurimas_Gr0

"Solving AI Agent Memory Loss: A Seven-Layer Architecture for Persistent Context at Scale"

@@TheValueist0

"6 Memory Types in AI Agents, The Real Intelligence Layer. Most people talk about prompts. Few talk about memory. But memory is what turns a chatbot into an intelligent agent. Here are the 6 core memory types every serious AI agent should have: Short-Term Memory (STM)..."

@@Umesh__digital0

"Please stop creating "memory for your agent" frameworks."

@u/thurn2254

Broadcast

Architecting Agent Memory: Principles, Patterns, and Best Practices - Richmond Alake, MongoDB

Memory in AI agents

How to add short-term memory to your AI agent (Sessions & State Explained)