Enterprise AI Context: Why Decision Logging Matters More Than Better Models

You’ve been in this meeting.

A customer escalation comes in. A pricing exception from six months ago. Someone asks: “Why did we agree to this?”

Nobody knows. It’s in someone’s head. Maybe. Or in a Slack thread from March that nobody will ever find. Or in meeting notes that weren’t tagged, weren’t summarized, weren’t connected to anything.

The CRM says 20% discount. That’s a fact. But why 20%? What was the customer’s situation? Who approved the exception? What precedent was set? Nobody wrote it down.

Now imagine your AI agent handling the next renewal for a similar customer. It searches your docs. Finds the policy: 10% max discount. Proposes 10%.

The customer churns. Because last time, a human knew the full picture. The incident history, the strategic value, the relationship context. A judgment call was made. The AI had the documents. It didn’t have the understanding.

The reasoning was never captured as data in the first place.

Three Layers of Context Your AI Doesn’t Have

When people say “we need to give our AI more context,” they usually mean documents. Throw more PDFs into the vector database. Index Confluence. Connect Google Drive.

That’s one layer. There are three.

Layer 1: Knowledge

What’s true now. Documents, wikis, policies, data. The stuff in Confluence, Google Drive, your internal wikis.

This is what RAG addresses. Semantic search over your documents. The pattern is understood, the tooling is mature. Most companies have some version of this.

It’s the next two that almost nobody builds.

Layer 2: Structure

Who owns what. How things connect. How information flows.

Which engineer owns the payments service? Who’s the escalation path for the Acme account? How does the support team’s backlog connect to the product roadmap?

These relationships exist in people’s heads. Scattered across org charts that are three months out of date, CRM fields that were never filled in, Slack channels you’d have to know about to search.

Ask your RAG system “who should I talk to about the Acme billing issue?” It’ll search for documents containing those words. It doesn’t know that Sarah owns Acme, that she escalated a billing dispute last month, and that the finance team resolved it with a credit. That’s structural context. Almost no system models it.

There’s a related gap that’s harder to see. Your process documentation says the support escalation has five steps. In practice, it has twelve. Undocumented variants, common workarounds, a manual exception-handling step everyone knows about but nobody wrote down. Tools like Celonis and Skan AI watch what actually happens. Which screens people click through, what sequence transactions actually follow, where the real process diverges from the flowchart. That’s structural context too. And it’s where agents trained on your documented processes fail hardest.

Layer 3: Reasoning

Why decisions were made. Under what conditions. With what exceptions.

This is the layer that evaporates fastest.

The VP approved a 20% discount. Why? The customer had three P1 incidents that quarter, was evaluating a competitor, and the account was strategic for a new market segment. The VP weighed all of this, consulted the deal desk, and made a call.

The CRM records one fact: 20% discount. Everything that made the decision legible, the inputs, the policy evaluation, the exception route, the approval chain, disappears into Slack threads and someone’s memory.

Foundation Capital calls the accumulated structure of these traces a “context graph.” A living record of decision traces stitched across entities and time so precedent becomes searchable. HBR argues from the competitive angle that this is where AI advantage actually lives. Murty and Kumar examined over 200 work patterns across 50+ large enterprises and found that two organizations with identical CRMs in identical industries operate completely differently. None of that operational behavior is captured anywhere.

Think about why companies created roles like RevOps. No software knew how the sales process actually worked end-to-end. Someone had to hold that understanding in their head. The handoffs between teams, the exceptions to the standard process, the context that made a deal work. That’s what those roles do: carry the reasoning that no system captures.

When an AI agent sits in that workflow, it can do something the human couldn’t: persist what it learns. Every decision, every exception, every precedent, captured as data instead of disappearing when someone switches jobs. The role doesn’t go away, but the context it carries finally becomes a system of record instead of institutional memory that walks out the door.

What You Can Build Today

The tooling isn’t perfect. But it’s further along than you’d think.

Tier 1: Decision Logging

Time to build: Days. What you need: An LLM, a database, and an automation tool like n8n or Make.

Most decisions happen in meetings. Not in chat, not in documents. Someone says “let’s go with option B” and everyone nods. If you’re lucky, the meeting notes mention it. If you’re not, it evaporates.

The good news: most companies already generate meeting transcripts and AI summaries. Google Meet has built-in Gemini summaries. Otter.ai, Granola, and Fireflies do the same. You already have the raw material. You’re just not extracting the right things from it.

The pipeline: After every meeting, an automation picks up the transcript and summary. An LLM processes both together, looking specifically for decisions. Not action items, not summaries. Decisions: what was decided, why, what alternatives were considered, who owns it, when it should be revisited. The output is a structured record:

{
  "decision": "Switch to usage-based pricing for enterprise",
  "rationale": "Seat-based pricing causes churn during low-usage periods",
  "alternatives_considered": [
    "Hybrid model (rejected: too complex)",
    "Keep seat-based (rejected: churn continues)"
  ],
  "owner": "CRO + CFO",
  "review_date": "2026-09-01",
  "tags": ["pricing", "enterprise"],
  "exceptions": [{
    "description": "Legacy accounts grandfathered until 2026-01-01",
    "approved_by": "CRO"
  }]
}

The system posts a confirmation to the relevant chat channel or sends an email: “Decision logged: Switch to usage-based pricing for enterprise.” People can correct it if the extraction was wrong. And they will need to. LLM extraction is confident but imperfect. A meeting where someone said “I think we should go with B, but let’s check with finance first” will sometimes show up as a decision when it was still a proposal. The confirmation step is what keeps the log trustworthy.

Weekly, the system scans decisions approaching their review dates and flags them. “You decided X six months ago. Still valid?”

That’s it. No graph database. No infrastructure project. A webhook, an LLM call, and a database.

But already: “How did we handle enterprise pricing before?” has an answer. “Who approved the exception for legacy accounts?” has an answer. “What was the rationale for switching?” has an answer.

For decisions that happen outside meetings, you need a lightweight capture channel. A dedicated email address (decisions@yourcompany.com) that people forward important threads to. A chatbot people can @mention. A simple form. The bar should be low enough that someone can capture a decision in under 10 seconds. The meeting pipeline handles the bulk. These channels catch the rest.

For engineering teams, this pattern already exists: Architecture Decision Records. Tools like adr-tools and log4brains have been doing structured decision capture for years. The move is extending this beyond code architecture to business decisions.

The n8n and Make template ecosystems both have working workflows for meeting enrichment. Transcript in, structured output posted to your tools. With some custom prompting, you can extend these to extract the decision records shown above.

Start with one meeting. Pick a recurring meeting that produces real decisions. Build the n8n or Make workflow for that one meeting: transcript lands, automation triggers, decisions get extracted, confirmation gets posted, corrections get captured. Run it for a week. If the extraction quality holds up, expand to more meetings.

A note on GDPR and data residency. Processing meeting transcripts through an LLM means employee conversations leave your infrastructure. Most providers offer EU data residency now. Google Cloud lets you set processing regions for Vertex AI and Gemini APIs. OpenAI offers a Data Processing Addendum and EU API endpoints for ChatGPT Enterprise. Anthropic’s Claude API supports EU data processing through AWS (Frankfurt, Ireland) or GCP (Netherlands) hosting. The specifics differ, but the principle is the same: choose your provider and region before you build, not after. That matters for GDPR compliance and it matters for getting this past your legal team.

Tier 2: Knowledge Graph

Time to build: Weeks. What you need: A graph database, something to extract entities from your content, and a way for agents to query it.

Tier 1 tells you what was decided. But organizations change constantly. People switch roles. Policies get updated. Customers churn and come back. A knowledge system that only knows current state will serve you stale context and not know it.

A knowledge graph models your organization as entities (people, companies, projects, decisions) and relationships between them. Not just “this document mentions Sarah” but “Sarah owns the Acme account, approved a pricing exception in Q2, and reports to the VP of Sales.” Queryable structure, not just keyword matches.

Two open-source projects stand out.

GraphRAG (open-source, from Microsoft Research) builds a knowledge graph from your documents, creates community summaries, and uses them for retrieval. Ask “who should I talk to about the Acme billing issue?” and instead of returning documents containing those words, it traverses the graph: Sarah owns Acme, she escalated a billing dispute last month, finance resolved it with a credit. In benchmarks, relationship-heavy queries show 3x+ improvement over pure vector search. The trade-off is cost: significantly more tokens per query, especially for global queries across large document sets. For stable document collections where accuracy matters more than speed, it’s the strongest option.

Graphiti (from the Zep team) solves a different problem. Organizations treat facts like light switches. On or off. But facts in organizations aren’t like that. Sarah owns the Acme account. Then she doesn’t. The system learns about it two days after it happens. A new fact corrects an old one, but the old one was true when you made a decision based on it.

The bi-temporal model makes all of this recoverable. Every fact gets four timestamps:

When true in reality: Sarah started at Anthropic on March 1st.
When no longer true in reality: Sarah left Anthropic on September 15th.
When the system learned it: We ingested the Slack announcement on March 3rd.
When the system learned it was outdated: We ingested the departure notice on September 16th.

Why does this matter? You can ask “What was true about Customer X when we made that deal in Q2?” and get the answer as it was at that point in time. Not today’s answer applied retroactively. When a new fact contradicts an old one, it doesn’t get deleted. It gets marked as superseded. The history stays. This is how you avoid the silent staleness problem I wrote about in On Context.

So that pricing exception from six months ago? With Graphiti, the agent handling the next renewal doesn’t just find “10% max discount” in the policy docs. It finds the decision trace: 20% exception, approved by the VP, three P1 incidents that quarter, strategic account in a new market segment. It sees the precedent, the reasoning, the conditions. It proposes 20% again, or flags it for a human to review. The customer doesn’t churn.

Graphiti ships an official MCP server, and community-built MCP wrappers exist for GraphRAG too, so Claude, Cursor, and any MCP-compatible agent can query the graph directly. (Other tools worth evaluating: LightRAG for cost-efficient incremental graphs, Mem0 for agent memory without schema modeling, Cognee for the simplest possible setup. Links in Go Deeper.)

Tier 3: Full Context System

Time to build: Months. Ongoing maintenance. What you need: Neo4j + a vector database (Qdrant, Chroma, pgvector) + Langfuse + n8n + MCP.

The full architecture. Five layers working together.

Capture. n8n workflows ingest from Slack, CRM, Jira, meeting transcripts. Webhooks, API polling, whatever gets the data in.

Extraction. An LLM processes raw content into entities and relationships. Neo4j’s GraphRAG Python package handles this. “Sarah from Engineering approved a 20% discount for Acme” becomes: Person(Sarah) → Team(Engineering), Person(Sarah) → Decision(20% discount), Decision → Account(Acme). Entities and relationships, extracted automatically.

Storage. A hybrid backend. Graph database for relationships, structure, and temporal queries. Vector database for semantic similarity search. Both, because they answer different questions:

Question type	Vector DB	Graph DB
”Find things similar to X”	Yes	No
”What did we decide about X?”	Partial	Yes
”How does A relate to B via C?”	No	Yes
”What was true at time T?”	No	Yes

Hybrid beats pure vector on enterprise data. Pretty consistently. The graph catches what semantics miss.

Freshness. Content-type TTLs that match how fast different kinds of information go stale:

Pricing data:            30 days
Personnel/org changes:   90 days
Policy documents:       180 days
Architecture decisions: 365 days

At retrieval time, a reasonable starting point: weight semantic relevance at ~70% and a freshness score at ~30%. Use exponential decay so the freshness score halves at the TTL boundary. Tune the weights to your domain. The exact ratio matters less than having any freshness signal at all. The pricing document from 18 months ago stops outranking this week’s update.

Serving. MCP servers expose the entire system to AI agents. Claude, Cursor, custom agents, anything that speaks MCP can query your organization’s knowledge, structure, and decision history.

What happens when agents start remembering

An agent handles a renewal for a mid-tier customer. It pulls the account history from the CRM. Finds two P1 incidents in the last quarter. Checks the knowledge graph for similar accounts. Sees that the customer’s industry is expanding. Proposes a 15% discount with a 2-year lock-in.

Langfuse captures every step of that reasoning as a trace. What the agent retrieved, what it weighed, what it decided, what it discarded.

Now a pipeline picks up that trace and writes it to the knowledge graph. New nodes: Decision(15% discount), linked to Account(customer), linked to Conditions(P1 incidents, industry growth), linked to Person(agent-initiated, human-approved). Next time any agent handles a similar renewal, it doesn’t start from scratch. It has precedent.

The context system builds itself as a byproduct of agents doing their jobs. Nobody has to log anything. The agents do it by working.

Nobody is running this full loop in production yet, as far as I can tell. But the pieces exist: Langfuse captures traces, Graphiti ingests them into a temporal graph, MCP serves them back to agents. It’s real engineering, not a feature you toggle on. But the companies wiring these pieces together now will have organizational memory that compounds. The rest will still be exporting Slack logs when they finally start.

Keeping It Alive

Everything above is useless if you build it and let it rot.

This is where most organizational knowledge systems die. The architecture was fine. Nobody maintained it. The knowledge graph that was accurate in January silently becomes a liability by June. The decision log that nobody reviews starts serving outdated rationale to agents that don’t know the difference.

Context degrades in ways that are hard to see and easy to ignore. Facts become false. Relationships change. Decisions get superseded but never marked as such. The system doesn’t break. It just starts being confidently wrong.

If you’re building any of this, read On Context first. It covers why context degrades, what the failure modes look like, and how decay scoring works. Everything here assumes you’re managing for staleness, not just accuracy.

The practical defenses:

Assign TTLs by content type. Pricing: re-verify every 30 days. Personnel: every 90. Policy: every 180. Architecture decisions: annually. These match how fast different kinds of organizational information actually changes.

Weight freshness at retrieval. Don’t just return the most semantically relevant result. A slightly less relevant but recent document is almost always more useful than a perfectly relevant but outdated one.

Automate the staleness scan. Weekly or monthly, sweep the knowledge base for anything past its TTL. Flag it for re-verification or human review. The system should nag, not silently serve stale context.

If you built Tier 2 with a temporal graph, contradicting facts get superseded automatically. But relevance decay, information that’s still true but no longer useful, needs the TTL layer on top.

Before You Build

Everything above assumes you’re building. But first: does someone already sell this?

Gemini Enterprise builds a per-customer knowledge graph linking people, content, and interactions across your connected data sources. If your company runs on Google Workspace, you already have structural context (Layer 2) feeding into it: reporting lines, direct reports, top coworkers, document ownership. It won’t capture decision reasoning from Slack, but it covers organizational relationships natively. Starting at $30/user/month.

Glean ($7B valuation, 100+ enterprise connectors) built an “Enterprise Graph” that indexes Slack, CRM, Jira, and documents with permission-aware search and identity resolution. It’s a managed version of Tier 2-3 starting around $50/user/month.

Atlassian Intelligence, Guru, Notion AI are all adding knowledge graph and agent capabilities on top of tools teams already use.

Where building wins: decision trace capture (Tier 1) and agent context as exhaust are not things any SaaS vendor does well today. The Slack bot that logs the why behind decisions, the pipeline that turns agent reasoning into organizational memory. You can only get that by building. And the buy side doesn’t give you control over the ontology, the extraction prompts, or the freshness scoring. If your competitive advantage lives in how your organization operates, not just what it knows, owning the stack matters.

For the layers vendors do cover:

If you already have…	Consider…	Before building…
Google Workspace	Gemini Enterprise	Tier 2
Budget for SaaS + 200+ seats	Glean	Tier 2-3
Confluence/Jira heavily	Atlassian Intelligence	Tier 2

What does this cost? Rough framing for a 200-person company:

Tier 1 (decision logging): 2-5 days of engineering time. Minimal infra cost. A webhook, LLM API calls (~$50-200/month), a database you probably already run.
Tier 2 (knowledge graph): A few weeks of engineering to get a working prototype. Production-grade: 1-3 months. Infra adds graph database hosting.
Tier 3 (full context system): 3-6 months of a small data engineering team. $50-200K+ TCO over the first year including infra, LLM API costs for continuous extraction, and ongoing maintenance.
Glean at 200 seats: ~$10,000+/month.

Tier 1 is a no-brainer. The cost is trivial and no vendor covers it. Start there. Tier 2 is worth starting only if you can commit an engineer to it for three months. If you can’t sustain that, buy Glean or try Gemini Enterprise. Don’t start Tier 3 until Tier 2 is running and you’ve seen what breaks.

Where to Start

Every company has access to the same frontier models. They’re commoditizing fast. What’s not commoditizing: the understanding of how your specific organization actually works. The decisions, the exceptions, the reasoning that connects data to action. Karpathy popularized the underlying skill as “context engineering.” Gartner positioned it as replacing prompt engineering. Two years optimizing how we talk to models. The bottleneck was always what we feed them.

You don’t need an enterprise initiative to start. The decision trace pattern is a webhook and an LLM call. Graphiti ships with Docker compose. n8n has working templates for meeting enrichment. You could have Tier 1 running by Friday.

Start with decisions. Capture the why, not just the what. The meeting where nobody remembers? That’s not a model problem. It’s a system you haven’t built yet.