Agentic Playbook #17

The Token Economy: Part 2

Reduce Waste Across the Agent Lifecycle

Token efficiency is not only about shorter responses. It comes from reducing unnecessary code, repeated explanations, duplicated context, vague prompts, and avoidable rework across the full agent lifecycle.

3 Jul 2026•6 min read•Leadership•article

Introduction

In Part 1 of The Token Economy, I explored a simple idea:

As software teams move into always-on agentic workflows, tokens are becoming part of the real cost of delivery.

Not just because models produce output.

But because agents read repositories, inspect files, run tools, reason through plans, generate code, summarise progress, review changes, and sometimes repeat the same context-heavy work several times across a single task.

Part 1 focused on spending tokens where thinking matters: token-aware session framing, Caveman-style concise communication, context budgeting, scoped retrieval, and model-tiered execution.

Part 2 is about the next layer.

How do we reduce waste across the wider agent lifecycle?

Because token efficiency is not only about asking for fewer words.

It is about reducing unnecessary code, repeated explanations, duplicated context, vague prompts, and avoidable rework.

In other words, the cheapest token is not always the one we compress.

Sometimes it is the one we never needed to generate.

1. Use Ponytail to Reduce Code, Not Just Output

One of the most interesting recent examples in this space is Ponytail.

The idea is wonderfully simple: make the agent behave like the laziest senior developer in the room.

Not lazy as in careless.

Lazy as in experienced enough to know that the best code is often the code you never write.

That framing matters because many AI coding agents naturally drift toward implementation. Ask for a solution and they will often create a helper, introduce an abstraction, install a dependency, wrap a platform feature, generate tests, add documentation, and explain the whole thing back to you.

Sometimes that is useful.

Often, it is waste.

Ponytail pushes the agent to ask a more senior question first:

Does this need to exist at all?

Then:

Is there already something in the codebase that does this?

Then:

Can the standard library solve it?

Then:

Does the platform already provide this?

Then:

Can this be one line?

Only after that does the agent reach for new code.

That is token economy thinking at its best.

Because the cost of unnecessary code is not limited to the tokens used to generate it.

You also pay for it through review time, test maintenance, future debugging, dependency risk, architectural complexity, and the cognitive load placed on every engineer who encounters it later.

This connects directly to Reviewability-Driven Development.

Smaller code is usually easier to review.

Simpler code is usually easier to reason about.

Less code means fewer future tokens spent explaining why the code exists.

The strongest token-saving strategy is not always compression.

Sometimes it is deletion.

2. Summarise Long Sessions Into State Files

Long-running agent sessions can become expensive very quickly.

As conversation history grows, the agent carries more context forward. Some of that context is useful. Much of it becomes stale, duplicated, or overly detailed.

A practical pattern I have started to appreciate is creating compact state files at meaningful checkpoints.

For example:

# Agent Session State Goal: Current decision: Files changed: Important constraints: Open risks: Commands run: Next step:

Instead of carrying the entire conversation forever, the agent periodically compresses the useful state into a small, structured summary.

Then the next session can restart from that state rather than from a long transcript.

This has two benefits.

First, it reduces token usage.

Second, it forces clarity.

The agent has to identify what actually matters: the goal, the decisions, the changed files, the risks, and the next step.

That is useful for the model.

It is also useful for the human.

In many ways, session state files are a lightweight form of engineering memory. They create continuity without dragging the full weight of the conversation behind them.

For teams using agents heavily, this may become an important practice.

Not because every session needs documentation.

But because long-running agent work needs checkpoints.

Otherwise, context becomes a swamp.

3. Store Stable Knowledge Once

If you keep repeating the same instruction to an agent, that is a signal.

It probably belongs somewhere persistent.

Stable knowledge should not live only in chat.

It should live in the repository, where agents and humans can reuse it.

That might mean:

• AGENTS.md • CLAUDE.md • .cursor/rules • architecture decision records • engineering standards • code review rules • testing guidance • security expectations • repository-specific skills

This connects closely to my previous article, Why Skills Will Outlive Models.

Models will continue to change.

But the durable asset is the knowledge your team packages around them: architectural principles, coding standards, review habits, domain constraints, and delivery expectations.

From a token economy perspective, this is extremely important.

Every repeated instruction is a small recurring cost.

Every missing rule creates a chance for the agent to guess.

Every guess creates the possibility of rework.

The more stable knowledge you encode once, the less you need to re-explain it in every session.

This does not mean agents should blindly follow static rules.

Rules can become stale.

They need ownership, review, and versioning.

But the principle remains strong:

If it matters often, do not keep prompting it manually.

Store it once.

Reference it consistently.

Improve it over time.

That is how teams move from individual prompting habits to reusable engineering capability.

4. Treat Spec-Driven Development as Cost Control

In my article on Spec-Driven Development, I argued that prompt-first development does not scale.

That point becomes even stronger when viewed through the token economy.

Vague prompts are expensive.

They lead to clarification loops, incorrect assumptions, over-generated code, repeated corrections, and long explanations about why the first attempt missed the mark.

A strong specification reduces that waste.

It gives the agent a clearer boundary:

• what the system must do • what it must not do • which behaviours must be preserved • which edge cases matter • which constraints are non-negotiable • how success will be validated

That is not just better engineering.

It is better cost control.

A vague prompt may look cheaper because it is shorter.

But if it creates three rounds of rework, it was not cheaper at all.

A good specification may cost more upfront, but it reduces waste across the whole task.

This is one of the most important mindset shifts in AI-assisted development.

The goal is not to minimise the first prompt.

The goal is to minimise unnecessary cycles.

Good specifications are not only quality tools.

They are token efficiency tools.

5. Use Output Contracts

One of the easiest ways to waste tokens is to let the agent decide the response shape every time.

Many models are trained to be helpful, explanatory, and conversational. That is useful in some contexts, but expensive and noisy in others.

A simple technique is to define an output contract before the task begins.

For example:

Return only: - Summary - Files changed - Tests run - Risks - Next action

Or:

Do not explain obvious changes. Explain only: - trade-offs - risks - non-obvious decisions - security-sensitive assumptions

This is especially useful for routine engineering tasks.

You do not always need a tutorial.

You often need a concise operational update.

Output contracts also improve reviewability because they create consistency. If every agent response follows a predictable structure, it becomes easier for engineers to scan, compare, and act.

This matters when agents are used across teams.

A single verbose response is mildly annoying.

Hundreds of verbose agent responses across daily workflows become operational noise.

Output contracts help reduce that noise.

They keep the communication focused on what humans need to know next.

The Real Goal Is Not Token Reduction

There is a danger in taking this topic too literally.

The goal is not always to use fewer tokens.

Sometimes the right decision is to spend more.

Deep architectural reasoning is worth the cost.

Security-sensitive review is worth the cost.

Complex migration planning is worth the cost.

Explaining a difficult concept to a junior engineer is worth the cost.

The problem is not token usage.

The problem is undisciplined token usage.

In the same way cloud cost optimisation should not mean starving production systems, token optimisation should not mean weakening engineering judgment.

The goal is to spend deliberately.

Use more tokens when they create understanding.

Use fewer when they create noise.

Final Thought

The token economy is not just a billing conversation.

It is an engineering discipline conversation.

Always-on agentic workflows are changing how software is planned, written, reviewed, tested, and maintained. That means the cost of context, reasoning, output, rework, and unnecessary code will increasingly shape how teams operate.

Part 1 was about spending tokens where thinking matters.

Part 2 is about reducing waste across the lifecycle.

Together, they point toward a simple operating principle:

Do not optimise for maximum agent activity.

Optimise for useful engineering progress.

Because the future of AI productivity will not belong to the teams that use the most tokens.

It will belong to the teams that know where tokens actually create value.

Continue the Playbook

#16

The Token Economy: Part 1

Spend Tokens Where Thinking Matters

#14

Spec-Driven Development

Why Prompt-First Development Does Not Scale

#15

The Large PR Problem

Reviewability-Driven Development in the AI Era

#12

Why Skills Outlive Models

From Prompt Engineering to Reusable Agent Capabilities