I run Claude Code as my production assistant. Five servers, 80 skills, 25+ automation scripts, 51 memory files. The context window is 1 million tokens, and I was burning through it before lunch.
The morning briefing was the worst. The AI would boot up, check disk space on all five servers, check load averages, check services, replay the previous session, scan email, check git repos, build a report — all inline. Every command output, every file read, just piling into the context window. By the time I said “okay, let’s actually work,” it had already consumed ~147K tokens. That’s 15% of the window gone on startup.
So sessions were dying at 3-4 hours. The AI would get slower, start losing track of what files it already read, re-read the same configs over and over. I was restarting it multiple times a day just to get through a deployment.
I started thinking about it differently. The usual advice is about managing context — compression, summarization, sliding windows. That’s all reactive. The context is already full by the time you’re compressing it. I wanted to stop the garbage from getting in there in the first place.
That’s what I’ve been calling context conservation. Here’s what I built.
Subagent Delegation
This was the biggest win. A subagent is a separate AI process with its own context window. It does work, burns through its own tokens, and only the summary comes back to the parent.
My morning briefing now spawns two Haiku subagents in parallel. One runs all the health checks — disk, load, memory, services, uptime monitors, access verification, crash analysis. The other replays the previous conversation JSONL and extracts unfinished work. The parent gets two structured summaries. Maybe 2K tokens total. The raw command output from 16+ bash calls and 40+ file reads never touches the parent context.
Haiku is the cheapest model, about 10x less than Opus. It’s perfectly good at running commands and formatting output. You don’t need deep reasoning to check disk space.
I extended this into what I call the Report Builder pattern. My AI generates about 3 reports a day — morning briefing, afternoon check-in, wrap-up. Each one is 15-30KB of formatted HTML. The old way: the AI builds the HTML inline, 30K+ tokens consumed, writes it to a file, outputs a URL. The HTML sits in context forever even though nobody references it again.
Now the AI assembles structured data — about 3KB of labeled sections, KPI values, action items — and hands it to a Haiku subagent with a template. The subagent builds the HTML, publishes it, returns a URL. The parent never sees the report contents. That’s ~27K tokens saved per report. Three reports a day, roughly 80K tokens per day that stay out of the working context.
Dedup Hook
The AI re-reads files constantly. Same config file, four times in one session. Each time, the full contents go into context. It’s a crutch — the information is already there from the first read, but the AI reaches for the file again instead of using what it has.
I built a PreToolUse hook on the Read tool that tracks access counts per file per session:
– First read: goes through clean
– Second read: warning injected — “this is your second read, extract what you need now”
– Third read: truncated to 50 lines
– Fourth and beyond: blocked
Counter resets on write or edit, since the file content changed and a fresh read makes sense.
This single hook had more impact than anything else I built. It breaks the re-read habit and forces the AI to actually retain information.
Scratchpad
Every report the AI generates goes to an HTML file and gets published to a web-accessible scratchpad. The AI outputs one line: the URL. I read it in a browser. The context window holds a URL, not 15K tokens of formatted HTML.
938+ scratchpad reports so far. At roughly 3,200 tokens average, that’s about 3 million tokens of content that never entered the context window.
Selective Memory Loading
I have 51 memory files — project context, behavioral corrections, client details, reference pointers. They’re indexed by a single MEMORY.md file that the AI always loads. It reads the index, decides which files matter for the current task, and loads only those.
Working on stt.news? Load the theme rebuild history. Skip the PromoGosh import, the IslandBarter palette, the CRA compliance suite. Working on a server deploy? Load the fleet architecture. Skip the editorial guidelines.
Loading everything is hoarding. Loading what you need is conservation.
Model Assignment
Not every task needs the expensive model. I explicitly assign models by job:
– Haiku: health checks, data gathering, email scanning, file searches, cron analysis
– Sonnet: complex debugging, code review, planning
– Opus: architecture decisions, security audits, client-facing content
When a Haiku subagent runs 16 tool calls, those calls consume Haiku-priced context. The Opus parent stays clean for the work that actually needs deep reasoning.
Results
Morning briefing: 147K tokens down to under 12K. Same information, 92% less context consumed. The AI starts each day with over 98% of its window available.
Daily savings stack up: ~125K saved on morning boot, ~80K saved on report generation, plus whatever the dedup hook blocks (hard to measure exactly, but it fires constantly). Sessions now run 8-12 hours without context pressure. I do multi-server deployments, full client deliverable cycles, and QA passes in single sessions. Before this, I was restarting 3-4 times to get through the same work.
Every token in the context window should earn its place. Temporary data goes to a subagent. Human-readable output goes to the scratchpad. Cross-session knowledge goes to memory files. And if the AI already read something, it doesn’t get to read it again.
Context is working memory. I treat it like RAM on a production server.
Terry Arthur is the founder of Terry Arthur Consulting, a web development and AI integration consultancy based in St. Thomas, U.S. Virgin Islands.