A local-first Python invoice sorter used to explore context continuity, token efficiency, custom agent skills, and architecture review across multiple agentic IDEs.
1. Introduction
Modern AI coding assistants are becoming more than autocomplete tools. They increasingly work as agentic IDE extensions that can inspect code, modify files, run tests, and support architecture decisions.
In my daily work, I use several AI coding tools. Each tool has its own strengths, limitations, context window, license constraints, and customization model.
The question I wanted to explore was:
Does a coordinated handoff approach have an impact when working with multiple agentic IDEs in the same project?
The repository used for this experiment is:
https://github.com/thomassuedbroecker/tax_preorganizer_public
The repository is a multi-agent IDE collaboration example using GitHub Copilot, OpenAI Codex, and Claude Code in one Visual Studio Code workspace, coordinated through .docs/HANDOFF.md
I call this lightweight coordination approach the handoff pattern.
In addition, I used IBM Bob for the final architecture review.
One thing I did not use in this repository was an file. Such a file could be helpful as an additional project-level instruction layer for agentic IDEs. It could define coding rules, review expectations, privacy constraints, and preferred workflows directly inside the repository.AGENTS.md
However, there is also a trade-off. An AGENTS.md file can become too strongly tailored to one specific agentic IDE or tool convention. In this experiment, I wanted the handoff mechanism to stay lightweight, readable, and usable across different tools and for humans.



2. The Practical Implementation Target: A Local-First Invoice Sorter
The implementation target was a local-first Python invoice pre-organizer.
The tool scans a folder of PDF and image invoices, extracts text and metadata, classifies each document into configurable categories, copies the files into category folders, and creates a Markdown summary for a tax advisor.
The application supports German and English invoice metadata such as vendor, invoice date, invoice number, gross amount, VAT amount, net amount, currency, and IBAN.
The main outputs are:
invoice_summary.mdaudit_log.jsonlperformance_log.json
This project was useful as a test case because it is small enough to understand, but still realistic enough to include privacy, classification, reporting, GUI, local AI review, and architecture-review concerns.
The invoice sorter was therefore not only a useful tool. It was also a realistic project to observe how different agentic IDEs behave when they must continue each other’s work.
3. The Real Problem: Context Does Not Travel Between Agentic IDEs
Working with one AI coding assistant can already be complex. Working with several introduces a different problem:
Each tool has its own isolated context. GitHub Copilot, OpenAI Codex, Claude Code, IBM Bob, and other agentic IDE tools do not automatically share:
- what was already implemented
- why a decision was made
- which tests were executed
- which limitations were found
- which files were changed
- what the next useful task should be
- which custom instructions or skills were relevant
Without coordination, each tool may rediscover the same context again and again.
That costs time, tokens, and focus.
4. The Handoff File: A Small Coordination Layer
The handoff file is not a complex framework. It is a simple shared memory between agentic IDE sessions.
In this project, each agentic IDE session followed the same basic discipline:
1. read the current handoff file
2. continue only a bounded task
3. document what changed
4. document verification evidence
5. document known limitations
6. define the next useful work item
Example:
<!-- Structure example for docs/HANDOFF.md -->## Current Handoff State- **Current Status:** [e.g., PySide6 GUI skeleton initialized]- **Recent Decisions:** [e.g., Kept sorting logic strictly rule-based, no LLM pipeline for file movement]- **Verification Evidence:** [e.g., `pytest tests/test_sorter.py` passing locally]- **Known Limitations:** [e.g., High memory consumption on 50MB+ PDFs]- **Next Task for Next Agent:** [e.g., Implement exception handling for malformed IBAN extraction]
The repository uses `docs/HANDOFF.md` as this shared continuity document. This makes it possible to rotate between tools when one context window or token allowance is exhausted, without losing the current project state.
The important point is not the file format.
The important point is the discipline:
Every agent should leave the project in a state that the next agent or human can understand.
5. Aligning customizations, skills, and agent expectations
The handoff file alone was not the complete solution.
I also tried to make sure that the different tools worked with my own customizations and expectations. This included things like:
- custom instructions
- skill or agent definitions
- coding style expectations
- verification expectations
- privacy constraints
- local-first assumptions
- task boundaries
- review expectations
This matters because different agentic IDEs use different mechanisms for customization. A skill, mode, prompt file, project instruction, or IDE-specific configuration is not automatically portable across tools. An AGENTS.md file could be another option for repository-level guidance, but it should be used carefully. If it becomes too tool-specific, it may reduce portability between different agentic IDEs.
The goal was not to make all tools identical.
The goal was to reduce unnecessary drift.
The handoff file preserved project state, while the customizations tried to preserve working style.
6. Privacy Was Not an Add-On
Invoices contain sensitive personal and financial data. Therefore, privacy was not an additional feature. It was an architectural constraint.
The project uses a local-first processing model:
- no network access in the normal invoice-processing path
- no full invoice text written to reports or audit logs
- copy-by-default instead of move-by-default
- dry-run mode before files are touched
- manual-review routing for uncertain documents
- optional local Ollama review after deterministic sorting
There is one important nuance: the optional Docling backend may download models during initial setup. However, the invoice-processing path itself is designed so that invoice data is not uploaded.
This distinction matters. Local-first does not only mean “AI runs locally.” It also means that file handling, reports, logs, and review workflows are designed around data minimization.
7. Deterministic First, AI Optional
One design decision was important:
The AI is not the core decision engine.
The deterministic pipeline scans, extracts, classifies, routes, and reports. The optional local AI review is added after the deterministic sorting step.
This keeps the system more predictable.
The rule-based classifier remains responsible for classification. The optional AI review can inspect the result, summarize anomalies, and support human review, but it does not move files or change categories.
AI should support inspection and review, not silently replace deterministic control flow.
8. The final architecture review with IBM Bob
After the implementation work, I used IBM Bob for a structured architecture review. This was interesting because the review was not just another code-generation step. It was a post-implementation quality gate.
The final architecture review was performed with IBM Bob in reviewer mode on 20 June 2026. It covered the full codebase and applied seven architecture dimensions:
- business alignment
- security and threat modeling
- scalability and performance
- architecture patterns
- maintainability and technical debt
- documentation
- Twelve-Factor compliance
The architecture review documents the review date, the full-codebase scope, and the seven architecture dimensions. It also explains that the source files in src/invoice_sorter/, the test suite, configuration, CI workflow, and documentation files were inspected before the findings were created. [4]
9. Architecture Review Results: Strong Foundation, Clear Risks
The architecture review did not only confirm that the project works. It produced a concrete engineering backlog.
The positive findings were that the project is:
- well-conceived
- privacy-first
- clean and testable
- internally consistent
- clearly separated between deterministic and probabilistic processing
The review also identified risks:
- prompt injection through extracted metadata
- IBAN values in audit logs
- missing file size limits
- sequential processing bottlenecks
- all results held in RAM
- partial Twelve-Factor compliance
- rough edges in the agent tier
For me, this was one of the most useful outcomes of the experiment.
The workflow did not stop after code generation. It moved into review, risk identification, and backlog creation.
That is the difference between “AI helped me code” and “AI supported an engineering workflow.”
10. What I Learned from the Experiment
10.1 Multiple agents need coordination
Using multiple agentic IDEs is not automatically productive. Without shared context, tools can duplicate work, repeat analysis, or make inconsistent assumptions.
10.2 The handoff file reduced context loss
The handoff file helped preserve project state and made it easier to continue work across tools. It did not replace thinking, but it reduced unnecessary rediscovery.
10.3 Customizations still matter
A handoff file documents project state. Custom instructions, skills, modes, and project rules help preserve working style.
Both are needed.
10.4 Architecture review is a useful final step
The IBM Bob review helped move the project from an AI-assisted implementation toward a more structured engineering workflow with a concrete improvement backlog.
10.5 Human review remains necessary
The repository itself states that all resulting changes remain subject to human review.
This is important. Agentic IDEs can accelerate work, but they do not remove responsibility from the developer.
11. Limitations of This Experiment
This was a practical development experiment, not a controlled benchmark.
I did not compare all tools under identical laboratory conditions. I also did not measure token savings with a formal before-and-after methodology.
The observations are based on one real project and my own workflow. Still, the experiment was useful because it made the coordination problem visible:
- context does not automatically travel between tools
- customizations are not automatically portable
- handoff discipline reduces repeated explanations
- architecture review can turn implementation output into an actionable backlog
So the value of this repository is not a universal benchmark.
It is a practical example of how coordinated multi-agent IDE development can work in a real project.
12. Conclusion
This repository is not just an invoice sorter. The invoice sorter was the implementation target. The more important topic was the workflow around it.
For me, the repository became a practical example of coordinated multi-agent IDE development:
- shared handoff
- aligned customizations
- local-first constraints
- deterministic-first architecture
- optional local AI review
- human review
- final architecture review with IBM Bob
The important learning is simple:
Agentic IDEs become more useful when their work becomes traceable, bounded, reviewable, and coordinated.
That is where AI coding support starts to move from isolated prompting toward a more disciplined engineering workflow.
13. References section for the blog post
[1] GitHub Repository: `tax_preorganizer_public`
https://github.com/thomassuedbroecker/tax_preorganizer_public
[2] README: Multi-agent IDE collaboration example and local-first invoice sorter
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/README.md
[3] Handoff document: Shared continuity layer between agentic coding tools
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/HANDOFF.md
[4] Architecture Review with IBM Bob
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/ARCHITECTURE_REVIEW.md
[5] Content Provenance: AI-assisted development disclosure and human review
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/CONTENT_PROVENANCE.md
[6] Architecture documentation: System design, data flow, and module responsibilities
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/ARCHITECTURE.md
[7] Quick Start: Setup and first dry run
https://github.com/thomassuedbroecker/tax_preorganizer_public/blob/main/docs/QUICK_START.md
Note: This post reflects my own ideas, implementation work, and experience. AI was used as a writing and thinking aid to help structure and clarify the arguments, but not to define the conclusions.
#AIEngineering #AgenticCoding #DeveloperWorkflow #GitHubCopilot #OpenAICodex #ClaudeCode #IBMBob #Python #LocalFirst #PrivacyByDesign #SoftwareEngineering #ArchitectureReview

Leave a comment