Coordinating Multiple Agentic IDEs with a Shared Handoff File

A local-first Python invoice sorter used to explore context continuity, token efficiency, custom agent skills, and architecture review across multiple agentic IDEs.

Introduction
The Practical Implementation Target: A Local-First Invoice Sorter
The Real Problem: Context Does Not Travel Between Agentic IDEs
The Handoff File: A Small Coordination Layer
Aligning customizations, skills, and agent expectations
Privacy Was Not an Add-On
Deterministic First, AI Optional
The final architecture review with IBM Bob
Architecture Review Results: Strong Foundation, Clear Risks
What I Learned from the Experiment
1. Multiple agents need coordination
2. The handoff file reduced context loss
3. Customizations still matter
4. Architecture review is a useful final step
5. Human review remains necessary
Limitations of This Experiment
Cost and Effort of the Experiment
1. Project Context
2. Time Investment
3. Tool Usage and Cost View
4. Practical Cost Summary
5. Key Observation
Conclusion
References section for the blog post

1. Introduction

Modern AI coding assistants are becoming more than autocomplete tools. They increasingly work as agentic IDE extensions that can inspect code, modify files, run tests, and support architecture decisions.

In my daily work, I use several AI coding tools. Each tool has its own strengths, limitations, context window, license constraints, and customization model.

The question I wanted to explore was:

Does a coordinated handoff approach have an impact when working with multiple agentic IDEs in the same project?

The repository used for this experiment is:

https://github.com/thomassuedbroecker/tax_preorganizer_public

I used several agentic tools side by side in one VS Code workspace. In practice for this example, OpenAI Codex and Claude Code carried most of the implementation, GitHub Copilot supported inline, and IBM Bob was used for a dedicated architecture-review pass. The focus of this post is less about how many tools were involved and more about how they shared one project through a common docs/HANDOFF.md. document.

One thing I did not use in this repository was an AGENTS.md file. Such a file could be helpful as an additional project-level instruction layer for agentic IDEs. It could define coding rules, review expectations, privacy constraints, and preferred workflows directly inside the repository.

However, there is also a trade-off. An AGENTS.md file can become too strongly tailored to one specific agentic IDE or tool convention. In this experiment, I wanted the handoff mechanism to stay lightweight, readable, and usable across different tools and for humans.

2. The Practical Implementation Target: A Local-First Invoice Sorter

The implementation target was a local-first Python invoice pre-organizer.

The tool scans a folder of PDF and image invoices, extracts text and metadata, classifies each document into configurable categories, copies the files into category folders, and creates a Markdown summary for a tax advisor.

The application supports German and English invoice metadata such as vendor, invoice date, invoice number, gross amount, VAT amount, net amount, currency, and IBAN.

The main outputs are:

invoice_summary.md
audit_log.jsonl
performance_log.json

This project was useful as a test case because it is small enough to understand, but still realistic enough to include privacy, classification, reporting, GUI, local AI review, and architecture-review concerns.

The invoice sorter was therefore not only a useful tool. It was also a realistic project to observe how different agentic IDEs behave when they must continue each other’s work.

3. The Real Problem: Context Does Not Travel Between Agentic IDEs

Working with one AI coding assistant can already be complex. Working with several introduces a different problem:

Each tool has its own isolated context. GitHub Copilot, OpenAI Codex, Claude Code, IBM Bob, and other agentic IDE tools do not automatically share:

what was already implemented
why a decision was made
which tests were executed
which limitations were found
which files were changed
what the next useful task should be
which custom instructions or skills were relevant

Without coordination, each tool may rediscover the same context again and again.

That costs time, tokens, and focus.

4. The Handoff File: A Small Coordination Layer

The handoff file is not a complex framework. It is a simple shared memory between agentic IDE sessions.
In this project, each agentic IDE session followed the same basic discipline:

1. read the current handoff file
2. continue only a bounded task
3. document what changed
4. document verification evidence
5. document known limitations
6. define the next useful work item

Example:

<!-- Structure example for docs/HANDOFF.md -->
## Current Handoff State

- **Current Status:** [e.g., PySide6 GUI skeleton initialized]
- **Recent Decisions:** [e.g., Kept sorting logic strictly rule-based, no LLM pipeline for file movement]
- **Verification Evidence:** [e.g., `pytest tests/test_sorter.py` passing locally]
- **Known Limitations:** [e.g., High memory consumption on 50MB+ PDFs]
- **Next Task for Next Agent:** [e.g., Implement exception handling for malformed IBAN extraction]

The repository uses `docs/HANDOFF.md` as this shared continuity document. This makes it possible to rotate between tools when one context window or token allowance is exhausted, without losing the current project state.

The important point is not the file format.

The important point is the discipline:

Every agent should leave the project in a state that the next agent or human can understand.

5. Aligning customizations, skills, and agent expectations

The handoff file alone was not the complete solution.

I also tried to make sure that the different tools worked with my own customizations and expectations. This included things like:

custom instructions
skill or agent definitions
coding style expectations
verification expectations
privacy constraints
local-first assumptions
task boundaries
review expectations

This matters because different agentic IDEs use different mechanisms for customization. A skill, mode, prompt file, project instruction, or IDE-specific configuration is not automatically portable across tools. An AGENTS.md file could be another option for repository-level guidance, but it should be used carefully. If it becomes too tool-specific, it may reduce portability between different agentic IDEs.

The goal was not to make all tools identical.

The goal was to reduce unnecessary drift.

The handoff file preserved project state, while the customizations tried to preserve working style.

6. Privacy Was Not an Add-On

Invoices contain sensitive personal and financial data. Therefore, privacy was not an additional feature. It was an architectural constraint.

The project uses a local-first processing model:

no network access in the normal invoice-processing path
no full invoice text written to reports or audit logs
copy-by-default instead of move-by-default
dry-run mode before files are touched
manual-review routing for uncertain documents
optional local Ollama review after deterministic sorting

There is one important nuance: the optional Docling backend may download models during initial setup. However, the invoice-processing path itself is designed so that invoice data is not uploaded.

This distinction matters. Local-first does not only mean “AI runs locally.” It also means that file handling, reports, logs, and review workflows are designed around data minimization.

7. Deterministic First, AI Optional

One design decision was important:

The AI is not the core decision engine.

The deterministic pipeline scans, extracts, classifies, routes, and reports. The optional local AI review is added after the deterministic sorting step.

This keeps the system more predictable.

The rule-based classifier remains responsible for classification. The optional AI review can inspect the result, summarize anomalies, and support human review, but it does not move files or change categories.

AI should support inspection and review, not silently replace deterministic control flow.

8. The final architecture review with IBM Bob

After the implementation work, I used IBM Bob for a structured architecture review. This was interesting because the review was not just another code-generation step. It was a post-implementation quality gate.

For the final review, I used a custom dedicated architecture-reviewer IBM Bob mode. What made this valuable was not the specific tool, but that a structured, persona-driven review pass turned the codebase into a concrete, line-referenced backlog rather than a vague “looks good.”

The final architecture review was performed with IBM Bob in the custom reviewer mode on 20 June 2026. It covered the full codebase and applied seven architecture dimensions:

business alignment
security and threat modeling
scalability and performance
architecture patterns
maintainability and technical debt
documentation
Twelve-Factor compliance

The architecture review documents the review date, the full-codebase scope, and the seven architecture dimensions. It also explains that the source files in src/invoice_sorter/, the test suite, configuration, CI workflow, and documentation files were inspected before the findings were created. [4]

9. Architecture Review Results: Strong Foundation, Clear Risks

The architecture review did not only confirm that the project works. It produced a concrete engineering backlog.

The positive findings were that the project is:

well-conceived
privacy-first
clean and testable
internally consistent
clearly separated between deterministic and probabilistic processing

The review also identified risks:

prompt injection through extracted metadata
IBAN values in audit logs
missing file size limits
sequential processing bottlenecks
all results held in RAM
partial Twelve-Factor compliance
rough edges in the agent tier

For me, this was one of the most useful outcomes of the experiment.

The workflow did not stop after code generation. It moved into review, risk identification, and backlog creation.

That is the difference between “AI helped me code” and “AI supported an engineering workflow.”

10. What I Learned from the Experiment

10.1 Multiple agents need coordination

Using multiple agentic IDEs is not automatically productive. Without shared context, tools can duplicate work, repeat analysis, or make inconsistent assumptions.

10.2 The handoff file reduced context loss

The handoff file helped preserve project state and made it easier to continue work across tools. It did not replace thinking, but it reduced unnecessary rediscovery.

10.3 Customizations still matter

A handoff file documents project state. Custom instructions, skills, modes, and project rules help preserve working style.

Both are needed.

10.4 Architecture review is a useful final step

The IBM Bob review helped move the project from an AI-assisted implementation toward a more structured engineering workflow with a concrete improvement backlog.

10.5 Human review remains necessary

The repository itself states that all resulting changes remain subject to human review.

This is important. Agentic IDEs can accelerate work, but they do not remove responsibility from the developer.

11. Limitations of This Experiment

This was a structured self-review inside my own tooling workflow, not an independent external audit. Even so, it was usefully candid. It surfaced concrete issues such as duplicated code, missing input sanitization, and a few features that were closer to prototype than finished, and it collected them into an actionable backlog.

I did not compare all tools under identical laboratory conditions. I also did not measure token savings with a formal before-and-after methodology.
I only observed the workflow on a single Git branch. I did not test feature branches or bug-fix branches. That would add another layer of complexity.

The observations are based on one real project and my own workflow. Still, the experiment was useful because it made the coordination problem visible:

context does not automatically travel between tools
customizations are not automatically portable
handoff discipline reduces repeated explanations
architecture review can turn implementation output into an actionable backlog

So the value of this repository is not a universal benchmark.

It is a practical example of how coordinated multi-agent IDE development can work in a real project.

12. Cost and Effort of the Experiment

This was a weekend experiment, not a production project.

The direct infrastructure cost of the app was almost zero because the project runs locally as a Python application. The real cost came from the AI coding tools, the coordination effort, the usage limits, and the cleanup work between the internal and public project versions.

12.1 Project Context

Area Description
Experiment type Weekend experiment
Project type Local-first Python application
Main app purpose Invoice pre-organizer for PDF and image invoices
Runtime infrastructure cost Almost zero during the experiment
Internal/private GitHub project Used for the real working process, iteration, and AI-assisted development
Public GitHub project Cleaned-up version for sharing the result openly
Public repository https://github.com/thomassuedbroecker/tax_preorganizer_public

12.2 Time Investment

Effort Area Approximate Effort
Total hands-on work 10–12 hours
Main working window Friday evening into Saturday
Coding and implementation Part of the total effort
AI tool coordination Significant part of the total effort
Review and verification Significant part of the total effort
Cleanup for public repository Additional effort after internal iteration

12.3 Tool Usage and Cost View

Tool Plan / Usage Experiment Impact
ChatGPT Plus / Codex Existing ChatGPT Plus subscription Usage limits reached approximately twice
Claude Code Claude Pro plan Usage limits reached approximately twice
GitHub Copilot Free plan Used within free plan limits
IBM Bob Pro plan with 40 Bobcoins Around 8 Bobcoins used
IBM Bob cost perspective 8 of 40 Bobcoins About 20% of the Pro plan allowance

12.4 Practical Cost Summary

Cost Type Impact
App infrastructure cost Almost zero
AI subscription cost Existing monthly subscriptions
IBM Bob usage Around 8 Bobcoins
Human engineering with Agentic IDEs time 10–12 hours
Coordination cost High compared to app runtime cost
Review cost Required to keep the result reliable
Cleanup cost Needed to separate internal work from the public repository

12.5 Key Observation

Observation Meaning
The app itself was inexpensive to run Local-first Python avoided cloud runtime costs
The tool cost was visible but manageable Existing subscriptions and limited Bobcoin usage were enough for the experiment
The real cost was coordination Multiple agentic IDEs need shared context, review, and handoff documentation
Usage limits mattered Agentic workflows can hit plan limits even during a small weekend experiment
Public sharing adds effort A cleaned-up public repository needs additional review and separation from internal work

Area	Description
Experiment type	Weekend experiment
Project type	Local-first Python application
Main app purpose	Invoice pre-organizer for PDF and image invoices
Runtime infrastructure cost	Almost zero during the experiment
Internal/private GitHub project	Used for the real working process, iteration, and AI-assisted development
Public GitHub project	Cleaned-up version for sharing the result openly
Public repository	https://github.com/thomassuedbroecker/tax_preorganizer_public

Effort Area	Approximate Effort
Total hands-on work	10–12 hours
Main working window	Friday evening into Saturday
Coding and implementation	Part of the total effort
AI tool coordination	Significant part of the total effort
Review and verification	Significant part of the total effort
Cleanup for public repository	Additional effort after internal iteration

Tool	Plan / Usage	Experiment Impact
ChatGPT Plus / Codex	Existing ChatGPT Plus subscription	Usage limits reached approximately twice
Claude Code	Claude Pro plan	Usage limits reached approximately twice
GitHub Copilot	Free plan	Used within free plan limits
IBM Bob	Pro plan with 40 Bobcoins	Around 8 Bobcoins used
IBM Bob cost perspective	8 of 40 Bobcoins	About 20% of the Pro plan allowance

Cost Type	Impact
App infrastructure cost	Almost zero
AI subscription cost	Existing monthly subscriptions
IBM Bob usage	Around 8 Bobcoins
Human engineering with Agentic IDEs time	10–12 hours
Coordination cost	High compared to app runtime cost
Review cost	Required to keep the result reliable
Cleanup cost	Needed to separate internal work from the public repository

Observation	Meaning
The app itself was inexpensive to run	Local-first Python avoided cloud runtime costs
The tool cost was visible but manageable	Existing subscriptions and limited Bobcoin usage were enough for the experiment
The real cost was coordination	Multiple agentic IDEs need shared context, review, and handoff documentation
Usage limits mattered	Agentic workflows can hit plan limits even during a small weekend experiment
Public sharing adds effort	A cleaned-up public repository needs additional review and separation from internal work

For me, this was one of the most important findings: The cost of agentic coding is not only the subscription price.

The real cost appears in coordination, review, verification, context handoff, and preparing a clean public version from an internal development workflow.

13. Conclusion

This experiment started with a simple local-first invoice pre-organizer. But the more important result was not the app itself. The important result was the workflow around the app. Using several agentic IDEs in the same project created a practical coordination problem. Each tool could help with coding, review, testing, or architecture thinking. But none of them automatically shared the full project memory with the others.

The docs/HANDOFF.md file helped reduce this problem.

It created a lightweight continuity layer for:

current status
implementation decisions
verification evidence
known limitations
next useful tasks

The experiment also showed that the main cost of agentic coding was not the local application runtime. The real cost appeared in coordination, review, usage limits, context handoff, and preparing a clean public version from the internal working project.

For me, the main takeaway is:

Agentic IDEs become more useful when their work becomes traceable, bounded, reviewable, and coordinated. At that point, AI-assisted development starts to shift from isolated prompting toward a more disciplined engineering workflow.

14. References section for the blog post

[1] GitHub Repository: `tax_preorganizer_public`
https://github.com/thomassuedbroecker/tax_preorganizer_public

[2] README: Multi-agent IDE collaboration example and local-first invoice sorter
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/README.md

[3] Handoff document: Shared continuity layer between agentic coding tools
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/HANDOFF.md

[4] Architecture Review with IBM Bob
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/ARCHITECTURE_REVIEW.md

[5] Content Provenance: AI-assisted development disclosure and human review
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/CONTENT_PROVENANCE.md

[6] Architecture documentation: System design, data flow, and module responsibilities
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/ARCHITECTURE.md

[7] Quick Start: Setup and first dry run
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/QUICK_START.md

Note: This post reflects my own ideas, implementation work, and experience. AI was used as a writing and thinking aid to help structure and clarify the arguments, but not to define the conclusions.

#AIEngineering #AgenticCoding #DeveloperWorkflow #GitHubCopilot #OpenAICodex #ClaudeCode #IBMBob #Python #LocalFirst #PrivacyByDesign #SoftwareEngineering #ArchitectureReview

2 thoughts on “Coordinating Multiple Agentic IDEs with a Shared Handoff File”

Add yours

Thomas McGee says:

June 24, 2026 at 2:23 pm

The shared handoff-file pattern makes sense. I think the broader issue is that context continuity becomes a library-management problem once you use more than one agentic IDE. You need a place for handoff notes, prompts, architecture rules, and reusable workflow instructions to live outside any single tool. Full disclosure: I’m the developer of MDraft. I built it to organize Markdown-based AI workflows and instruction files across tools. https://www.mdraft.com

LikeLiked by 1 person

1. thomassuedbroecker says:
  
  June 24, 2026 at 3:25 pm
  
  Thanks for your feedback, and it’s true what you saying and there is much more. That is also mentioned in the post.
  I like calling it „library – management“.
  
  LikeLike