The cards above show where loops fit. This is how one actually gets built: the sequence we follow on every engagement, using the insurance intake case as the worked example.
Details are representative composites, anonymised.
- Weeks 1 to 2: discovery
- We sit with the team that handles extraction failures and write down what they actually do. Not the runbook, the reality. Which document sources break most, which fields are inferable and which are sacred, which carriers have known template quirks. Two outputs: an honest four-condition assessment of the workload, and the first draft of the skill file, the accumulated tribal knowledge, written down once, that the loop will read on every run. This is also where we model token spend at your document volumes and put a number in front of your CFO before anyone commits.
- Weeks 3 to 4: one reliable manual run
- Before anything is scheduled, an engineer runs the full cycle by hand: sample the failures, draft the fix, replay the historical library, review the diff. If the manual run isn't reliable, the loop won't be. Scheduling an unreliable process just automates the unreliability. This step is not optional and we don't skip it, even when it's tempting.
- Weeks 5 to 8: the loop, assembled
- The automation watches the failure queue with hard stops: token budget, iteration cap, time limit. The maker agent drafts fixes in an isolated workspace. A separate checker agent (different instructions, no access to the maker's reasoning) verifies the replay benchmark before any change reaches human review. Connectors wire it into your real tools: the repo, the ticket queue, the team's channel. The state file lives in your repo, recording every classification, every fix, every escalation, and the lessons learned that used to evaporate with staff turnover.
- Weeks 9 to 12: hardening and handover
- Permissions audited and scoped to least privilege. Security checks in the gate itself: dependency audit, secret scanning, no unreviewed code path to production. Your engineers run the loop, modify the skill file, and rehearse the off switch. Then we leave, and the system stays: documented, instrumented, and yours.
- The number we build to
- Cost per accepted change. Not tokens spent, not tasks attempted. If your team rejects half of what the loop drafts, the loop is losing, and the instrumentation will show it before the invoice does.