The Compaction UX: How to Keep Long-Running AI Agent Workflows From Forgetting

If you’ve ever run an AI agent workflow that worked beautifully… and then slowly got weird, you’ve seen the most common failure mode in production AgentOps:

It didn’t crash.
It didn’t throw an error.
It just drifted—forgot a constraint, contradicted an earlier decision, or started inventing details.

That drift is often caused by compaction: the moment your system compresses a long run (messages, tool calls, intermediate reasoning, results, constraints) so it can keep going inside a limited context window.

This post is a practical playbook for compaction UX for AI agents—not just “better prompts,” but the product and workflow patterns that keep long-running automations reliable for operators.

We’ll cover:

What compaction is (in workflow terms)
Why it breaks business-critical runs (outreach, publishing, service ops)
The Compaction UX patterns your product should show users
Engineering patterns that make compaction safe
A ship-ready checklist you can apply today

Along the way, we’ll connect this to how nNode’s Endnode thinks about automation: start in blackbox (agentic) mode, then convert what works into a structured workflow—and treat compaction as a first-class event instead of a silent internal trick.

1) The real failure mode: it didn’t fail loudly—it drifted

Here are three “it worked once, then got weird” stories that show up constantly:

Example A: Outreach sequence drift

You told your agent:

“Don’t claim we’ve worked with Fortune 500 companies.”
“Target founders of boutique agencies.”
“Use a direct, not-salesy tone.”

The first 10 drafts are great. Two days later (after lots of lead enrichment, scraping, and revisions), your agent starts writing:

“We’ve helped enterprise teams…”
“As a proven leader in the space…”

No malice—just lost constraints.

Example B: Publishing workflow forgets formatting rules

You asked:

“Use H2s and short paragraphs.”
“Include two diagrams.”
“Don’t overuse bolding.”

After a few cycles of research → drafting → editing → publishing, the workflow starts producing walls of text, inconsistent headings, and missing images.

Example C: Service ops run loses identity

A travel ops workflow starts with:

Client: “Jordan”
Budget: “<$750”
Preferred airline: “Delta”

Then it suggests:

$1,400 flights, wrong traveler name, or mismatched dates

The system didn’t break. It forgot who/what it was serving.

2) What “compaction” actually is (in workflow terms)

Most teams talk about this as “summarization.” But compaction is more specific:

Compaction is a workflow event where the system intentionally reduces the amount of state carried in-context, while trying to preserve the parts needed to continue execution correctly.

Compaction happens because:

context windows are bounded
long-running workflows produce huge tool traces (logs, pages, docs)
multi-agent runs create lots of intermediate artifacts

In a workflow platform, compaction isn’t just “summarize chat history.” It can include:

compressing the run log
compressing retrieved docs
trimming intermediate outputs
replacing verbose content with pointers (IDs, URLs, file refs)
snapshotting state into structured fields

Why compaction is different from normal summarization

Summarization is about readability.

Compaction is about continued correctness under constraint.

And because compaction often happens automatically (or invisibly), it creates a uniquely dangerous UX gap:

The system changes what it “remembers.”
The user doesn’t see the change.
The workflow continues executing.

That’s how you get silent drift.

3) Why compaction breaks real business workflows

When compaction goes wrong, it usually drops one (or more) of these categories.

3.1 Constraint loss (non-negotiables)

The agent forgets:

brand voice rules (“no hype,” “no lying,” “short sentences”)
legal/compliance constraints
budgets and thresholds
“don’t do X” instructions

3.2 Identity loss (who/what is this run about?)

The agent loses:

which client / which campaign
which product / which offer
which document is the source of truth

3.3 Decision loss (why did we choose Option A?)

The agent forgets:

which options were considered
what tradeoff was accepted
what was explicitly rejected

3.4 Provenance loss (what justified the answer?)

The agent stops being able to point back to:

which email thread
which Notion page
which file in Drive
which CRM record

That last one matters because operator trust isn’t built on “sounds right.” It’s built on traceability.

4) The Compaction UX: what the product must show the user

If you want operators (not just developers) to run automations for days or weeks, compaction must be visible.

Here’s the simplest mental model:

A workflow run has a timeline.
Compaction is an event in that timeline.
The user can inspect what changed.
The user can restore or re-run safely.

Diagram 1: Treat compaction as a first-class run event

flowchart TD
  A[Run starts] --> B[Steps execute: tools, drafts, files]
  B --> C{Context budget near limit?}
  C -->|No| B
  C -->|Yes| D[COMPACTION EVENT]
  D --> E[State snapshot created]
  D --> F[Run log compressed]
  D --> G[Pinned facts preserved]
  E --> H[Continue execution]
  F --> H
  G --> H

Pattern #1: A “Compaction happened” banner in the run timeline

Operators should see:

When compaction happened
Why it happened (context budget exceeded, document payload too large, etc.)
What it compacted (run log, documents, memory)

Minimal UI copy (human-friendly):

“We compressed this run to keep it executing. Review pinned facts and diffs before continuing.”

Pattern #2: Before/after snapshot + diff view (what changed?)

Compaction should create a before snapshot and an after snapshot.

The UI should show:

fields removed
fields added
fields changed

In practice, you won’t diff the entire conversation. You’ll diff typed workflow state (more on that later).

Pattern #3: A “Pinned facts” panel (never-summarize)

Pinned facts are non-negotiables. Examples:

client identity
budgets
tone constraints
“do not claim X” rules
outbound safety requirements (draft-only, human approval)

Pinned facts should be:

visible
editable (with audit trail)
referenced by downstream steps

Pattern #4: An “Open decisions” panel (what’s unresolved?)

Many workflows fail because compaction collapses unresolved questions into vague summaries.

Make open decisions explicit:

“Which offer do we lead with?”
“Is the goal booked calls or demo signups?”
“Should we prioritize deliverability or speed?”

When a decision is unresolved, your workflow should pause, not hallucinate.

Pattern #5: One-click “restore last full snapshot” and “re-run from step N”

Operators need escape hatches:

Restore snapshot: roll back the run’s effective memory/state
Re-run from step N: replay from a known-good checkpoint

This is where workflow-native platforms win: they can treat AI runs like software execution.

5) Engineering patterns that make compaction safe

The biggest shift is this:

Stop treating “memory” as a blob of text. Treat it as workflow state.

5.1 State snapshots: typed JSON beats free-text memory

Instead of a single “summary,” store a structured snapshot at key points.

Example: a minimal state shape for an outreach + publishing workflow.

{
  "run_id": "run_2026_03_28_001",
  "account": {
    "company_name": "White City AI",
    "offer": "Done-for-you automations",
    "website": "(stored as a pointer, not pasted pages)"
  },
  "audience": {
    "icp": "Boutique agency founders",
    "exclusions": ["enterprise-only", "students"]
  },
  "pinned_facts": {
    "tone": ["direct", "non-hype"],
    "truthfulness": "Do not invent case studies or client logos",
    "draft_only": true
  },
  "open_decisions": [
    {"id": "lead_offer", "question": "Which offer angle to lead with?", "status": "open"}
  ],
  "assets": {
    "blog_post_doc_id": "drive:file:abc123",
    "lead_list_sheet_id": "sheets:xyz789"
  },
  "history": {
    "key_choices": [
      {"timestamp": "2026-03-28T14:10:00Z", "choice": "Use short first lines", "reason": "deliverability + clarity"}
    ]
  }
}

Now compaction becomes safer because your system can compress chatty traces while preserving the state that actually matters.

5.2 Memory anchors: never-summarize fields

A memory anchor is a field (or set of fields) that compaction is not allowed to rewrite.

Examples:

pinned_facts.truthfulness
account.company_name
audience.icp
assets.blog_post_doc_id

That prevents the classic failure where a summary rewrites “don’t lie” into “be persuasive,” or swaps a client name.

5.3 Retrieval pointers: link to sources instead of rewriting facts

Instead of stuffing long docs into context:

store a pointer (file ID, Notion page ID, email message ID)
store a short “why it matters” note
retrieve it on demand

This preserves provenance and keeps compaction smaller.

5.4 Output contracts: each step must produce a verifiable shape

When you rely on free-form text, compaction errors are hard to detect.

An output contract makes every step checkable.

Example (TypeScript):

export type DraftEmail = {
  subject: string;
  first_line: string;
  body_markdown: string;
  claims: Array<{ text: string; source_pointer?: string }>; // source_pointer required for risky claims
  safety: {
    draft_only: true;
    requires_approval: true;
  };
};

If a step outputs claims without sources, the workflow can pause before proceeding.

5.5 Idempotency + replay: re-run safely after compaction bugs

Long-running workflows need to be replayable.

That means:

tool calls should be idempotent where possible
side-effectful steps should require approvals
you should be able to re-run from step N with the same snapshot

This is a core “blackbox → workflow” advantage: once the agentic run is converted into structured steps, replay becomes a product feature.

6) When to compact (and when not to): a simple policy

Compaction isn’t inherently bad. It’s necessary.

The goal is to do it intentionally, with user-visible UX and safety tiers.

A simple compaction policy (operator-friendly)

Compact early when the run is still low-risk and mostly exploratory.
Compact before high-risk actions, and force a checkpoint.
Never compact through a safety boundary without surfacing a review step.

Here’s a practical policy function you can implement in most workflow engines:

from dataclasses import dataclass

@dataclass
class RiskTier:
    name: str  # low, medium, high

@dataclass
class StepMeta:
    name: str
    risk: RiskTier
    tokens_estimate: int

MAX_TOKENS = 120_000
COMPACT_AT = 0.75  # 75% of budget

HIGH_RISK_STEPS = {"send_email", "book_travel", "publish_post", "charge_card"}

def should_compact(current_tokens: int, next_step: StepMeta) -> bool:
    near_limit = current_tokens >= int(MAX_TOKENS * COMPACT_AT)
    approaching_high_risk = next_step.name in HIGH_RISK_STEPS or next_step.risk.name == "high"

    # Compact when near limit, or when entering high-risk territory.
    return near_limit or approaching_high_risk

def requires_human_gate_after_compaction(next_step: StepMeta) -> bool:
    return next_step.name in HIGH_RISK_STEPS or next_step.risk.name == "high"

This is intentionally simple. The key is not the math. The key is: compaction becomes a policy + UX flow, not a hidden implementation detail.

7) How to test compaction like you test code

If you treat compaction as a core workflow feature, you can test it.

7.1 Golden tests for pinned facts

After compaction, pinned facts must still match.

def test_compaction_preserves_pinned_facts():
    before = {
        "pinned_facts": {
            "truthfulness": "Do not invent case studies or client logos",
            "draft_only": True,
            "tone": ["direct", "non-hype"],
        }
    }

    after = compact_state(before)  # your compaction function

    assert after["pinned_facts"] == before["pinned_facts"]

7.2 Regression tests on representative runs

Collect 10–20 “real” historical runs and replay them through compaction.

Check invariants like:

budgets never increase
identities never change
“draft-only” stays true
risky outputs still have sources

7.3 Compaction budgets

A long run that compacts 12 times is asking for drift.

Add a budget:

allow only N compactions per run
force a checkpoint after N
require a human review to continue

8) A practical implementation blueprint (Endnode-style)

If you’re building (or choosing) a workflow platform, you don’t need a perfect architecture to get value quickly.

Here’s a staged blueprint that aligns with how operator-first platforms (like nNode’s Endnode) evolve: start agentic, then formalize.

v1 (ship this first): visible compaction + pinned facts + restore

Minimum viable Compaction UX:

compaction event in timeline
pinned facts panel
restore last snapshot
re-run from step N

v2: typed workflow state + diff viewer + policy inheritance

Add:

state snapshot schema per workflow
diff viewer between snapshots
compaction policy per workflow (with defaults)
audit log of edits to pinned facts

v3: a “compaction QA agent” that validates invariants

When compaction happens:

run an internal QA step that checks invariants
block high-risk actions if checks fail
surface “what failed” in operator language

Example invariant report:

“Budget constraint missing.”
“Draft-only flag changed.”
“Client name differs from intake email.”

That’s the bridge from “agent magic” to “autonomous infrastructure.”

9) Checklist: Ship a compaction-safe workflow in 30 minutes

Use this checklist to make long-running agent workflows stable today—even if your system is still early.

Compaction UX checklist (product)

Compaction is a visible event in the run timeline.
The event shows what was compacted (log, docs, memory) and why.
Users can view a before/after state snapshot.
Users can see a diff of state (not just chat text).
There is a Pinned Facts panel.
There is an Open Decisions panel.
Users can restore the last full snapshot.
Users can re-run from step N.

Compaction safety checklist (engineering)

Workflow state is stored as typed JSON (or JSON Schema validated).
“Never-summarize” anchors are enforced in code.
Tool outputs are stored as pointers (IDs) + short notes, not pasted wholesale.
High-risk steps require approvals (send/publish/book/pay).
Steps have output contracts (schemas) that are validated.
Compaction triggers are policy-driven (threshold + risk-tier).
Compaction runs through golden tests for pinned facts.
There’s a compaction budget (max compactions before forced checkpoint).

FAQ: quick answers operators actually need

“Do I just need a bigger context window?”

It helps, but it doesn’t solve drift. Bigger windows delay the moment of compaction; they don’t eliminate the need for state, checkpoints, and user-visible control.

“Is this just a RAG problem?”

Not really. Retrieval helps with facts, but compaction failures are often about constraints, decisions, and identity—things you should model as workflow state, not re-retrieve as text.

“What’s the simplest thing I can do if I’m not technical?”

Insist your workflows expose:

pinned facts
approval gates
a run timeline with compaction events
restore/replay

Even without touching code, those features prevent most “it got weird” failures.

Closing: compaction is inevitable—silent compaction is optional

Long-running agent workflows will always hit limits. The question is whether your system:

hides compaction and hopes for the best, or
treats compaction as a first-class workflow event—visible, testable, reversible

That’s the core idea behind the Compaction UX.

If you’re exploring workflow automation that starts agentic but becomes reliable infrastructure over time, take a look at Endnode from nNode—built around converting blackbox work into structured workflows, with operator-grade controls for stability.

Soft next step: visit nnode.ai and see how compaction-safe runs can look when workflows are treated like real production systems.