Sarvaswa AI Labs
Loop Engineering

The same breakage, every week. The same person, fixing it by hand. That's what a loop is for.

Claims that mis-parse. Carrier feeds that go dark. Reconciliations that back up. AI agents got good enough to fix this class of work unsupervised, provided someone designs the system that supervises them. A loop is that system: it finds the work, hands it to an agent, verifies the result mechanically against your own historical data, records what happened, and decides the next move. We design and build those systems inside your infrastructure. Not a chatbot with a schedule. A production system your team owns, understands, and can switch off.

Should you even build one

Most teams don't need a loop yet. We'll tell you if you're one of them.

A loop earns its cost under four conditions, and only under all four.

The work recurs.

Loops amortise their setup across many runs. If the problem doesn't come back at least weekly, you need a good prompt, not a system. We'll say so.

A machine can reject bad output.

A test suite, a schema validator, a replay against historical data, a build that compiles or doesn't. If "done" requires a human opinion, a loop just produces opinions faster.

The budget absorbs the waste.

Loops retry, re-read, and explore. That spend happens whether or not a run ships anything. We model token economics at your scale before you commit, not after the first invoice.

The agent has real tools.

Logs, historical data, a safe environment to run what it writes and watch it break. An agent without these doesn't iterate; it guesses repeatedly.

Miss one condition and the loop costs more than it returns. That assessment is where every engagement starts, and where some of them honestly end.

Where loops earn their keep

The same pattern, six industries.

Every business below has the same hidden workload: repetitive, machine-checkable, quietly expensive, and arriving on someone else's schedule. That is exactly the shape of work a loop absorbs. The domain changes; the anatomy doesn't.

Insurance: claims intake that keeps breaking.

Claims arrive as PDFs, scans, broker emails, and portal uploads, in formats that shift every time a carrier updates a template or a new broker joins the panel. Extraction code decays quietly, and mis-parsed claims surface weeks later as rework and complaints.

The loop

Watches the extraction-failure queue, samples the documents that broke, drafts updates to the parsing and mapping code, and proves them against a labelled library of historical claims before a human reviews the change.

The gate

Replay accuracy. A candidate fix must extract the historical library at or above the current benchmark and resolve the new failures, or it escalates instead of shipping.

What stays human

Every coverage decision, every payout, every customer contact. The loop maintains the plumbing; adjusters keep the judgment.

CNC & industrial manufacturing: telemetry that goes dark.

Machine fleets stream telemetry into dashboards and predictive-maintenance models. Then a firmware update changes a payload format, parsers break, and the dashboard gap gets noticed days later, sometimes after the signal it would have caught.

The loop

Triggers off the dead-letter queue within minutes of drift appearing, diffs the malformed payloads against the known schema, and drafts a parser update verified by replaying historical machine traffic.

The gate

The fix must process both the new payloads and the historical archive with no field silently dropped or re-typed.

What stays human

Alert thresholds, maintenance decisions, anything that touches the machines themselves. The loop repairs data plumbing; it never commands equipment.

Logistics: forty carrier feeds, none of them stable.

Shipment-status feeds from dozens of carriers, in EDI, flat files, and half-documented APIs. Carriers change formats without notice; shipments go dark on the tracking portal; the integrations team finds out from support tickets.

The loop

Fires when a carrier's unparsed-message rate spikes, consults its accumulated file of that carrier's past behaviour, and drafts a parser fix in an isolated workspace. One per carrier, so simultaneous breakages don't collide.

The gate

Golden-file replay. The fix must parse the new samples and re-parse that carrier's recent history with zero regressions in status, timestamps, and container IDs.

What stays human

New-carrier onboarding, customs documents, billing messages, every merge.

Real estate: listing data that's wrong somewhere, always.

Listings flow in from agents, franchises, and MLS feeds, then syndicate out to portals, each with its own field rules, photo requirements, and rejection quirks. Bad mappings mean listings silently fail to publish, price updates lag, and a property shows sold on one portal and available on another.

The loop

Runs against the syndication rejection reports daily, traces each rejection class to the mapping or validation code responsible, drafts the correction, and verifies it by re-running the full listing set through the updated pipeline.

The gate

Zero new rejections across the historical listing set, and no listing's published fields changed except the ones the fix targets.

What stays human

Pricing, descriptions, anything a vendor signed off. The loop fixes how data moves, never what the property claims to be.

Financial services: the reconciliation backlog that only grows.

Daily settlement files against an internal ledger, and every morning a few hundred exceptions that mostly belong to a handful of recurring causes: rounding rules, midnight-cutoff timezones, duplicate webhooks. Ops classifies them by hand, and the backlog outpaces the team.

The loop

Takes one recurring break class at a time and iterates until a root-cause theory survives: replaying settlement history, bisecting when the class first appeared, testing candidate fixes to the matching logic.

The gate

Binary and unforgiving. A fix must reconcile ninety days of history with the known breaks resolved and zero new exceptions introduced, verified by a second agent with fresh context.

What stays human

The ledger. The loop fixes matching code; it never writes to balances, never touches funds. By permission scope it cannot escalate.

Healthcare administration: remittance formats that never sit still.

Payer remittances and eligibility responses arrive in dialects. Every payer's implementation is slightly different, updated on their schedule, not yours. Parsing breaks show up as posting delays and denials nobody can explain.

The loop

Monitors posting-failure rates per payer, samples the failing transactions, and drafts parser updates proven against de-identified historical transaction sets inside your compliance boundary.

The gate

Replay across the historical set with posting accuracy at or above benchmark. The loop runs entirely inside your infrastructure, touching PHI only where your data-boundary rules already allow.

What stays human

Every clinical and coverage judgment, every appeal, every patient communication.

How a loop gets built

One build, end to end.

The cards above show where loops fit. This is how one actually gets built: the sequence we follow on every engagement, using the insurance intake case as the worked example.

Details are representative composites, anonymised.

Weeks 1 to 2: discovery
We sit with the team that handles extraction failures and write down what they actually do. Not the runbook, the reality. Which document sources break most, which fields are inferable and which are sacred, which carriers have known template quirks. Two outputs: an honest four-condition assessment of the workload, and the first draft of the skill file, the accumulated tribal knowledge, written down once, that the loop will read on every run. This is also where we model token spend at your document volumes and put a number in front of your CFO before anyone commits.
Weeks 3 to 4: one reliable manual run
Before anything is scheduled, an engineer runs the full cycle by hand: sample the failures, draft the fix, replay the historical library, review the diff. If the manual run isn't reliable, the loop won't be. Scheduling an unreliable process just automates the unreliability. This step is not optional and we don't skip it, even when it's tempting.
Weeks 5 to 8: the loop, assembled
The automation watches the failure queue with hard stops: token budget, iteration cap, time limit. The maker agent drafts fixes in an isolated workspace. A separate checker agent (different instructions, no access to the maker's reasoning) verifies the replay benchmark before any change reaches human review. Connectors wire it into your real tools: the repo, the ticket queue, the team's channel. The state file lives in your repo, recording every classification, every fix, every escalation, and the lessons learned that used to evaporate with staff turnover.
Weeks 9 to 12: hardening and handover
Permissions audited and scoped to least privilege. Security checks in the gate itself: dependency audit, secret scanning, no unreviewed code path to production. Your engineers run the loop, modify the skill file, and rehearse the off switch. Then we leave, and the system stays: documented, instrumented, and yours.
The number we build to
Cost per accepted change. Not tokens spent, not tasks attempted. If your team rejects half of what the loop drafts, the loop is losing, and the instrumentation will show it before the invoice does.

Track record

We've run unsupervised production systems before.

Loop engineering is a new name. The discipline is not. For a regulated enterprise, we built a predictive auto-scaling engine that forecasts transaction load and scales a GPU fleet ahead of demand: autonomous, in production, inside the client's infrastructure, with hard limits and continuous observability. Different domain, same anatomy. A trigger, an objective gate, hard stops, and a system the client's team owns and operates without us.

~40%

Lower monthly compute

100%

Inference inside client infra

24/7

Observability

20+ companies supported worldwide · $5M+ business value created

The system

Five parts. Built inside your infrastructure.

Automations

Scheduled and event-driven triggers with objective stop conditions. The loop runs until the replay passes, not until a model feels finished. Every automation ships with hard stops: token budget, iteration cap, time limit.

Skills

Your conventions and hard-won operational knowledge (the payer quirks, the carrier stunts, the "never touch this without asking" rules) written once and read on every run. Without this, a loop re-derives your context from zero every cycle. With it, intent compounds.

Connectors

MCP connectors into the tools you already run: the repo, the ticket queue, the messaging channel, the monitoring stack. The difference between an agent that describes a fix and a loop that ships one with the evidence attached.

Maker/checker splits

The agent that writes never grades its own work. A separate verifier (different instructions, often a different model) stands behind an objective gate. This is the structural decision that lets your team walk away while the loop runs.

State

Persistent, version-controlled memory in your repo. The agent forgets between runs; the repo does not. Tomorrow's run resumes instead of restarting, and lessons learned get written down once instead of rediscovered nightly.

Scope, honestly

Loops for machine-checkable work. Humans for judgment calls.

Every use case above shares one property: a machine can reject bad output. That property is the boundary, and we hold it.

Loops never touch coverage decisions, payouts, ledger balances, pricing, clinical judgments, or physical equipment. They never deploy to production without a human gate, never modify their own permissions, and never expand scope for convenience. Security checks live in the gate itself (dependency audit, secret scanning, static analysis) because an unattended loop is an unattended attack surface, and we build like that's true. Permissions start read-only; every write permission is scoped, justified, and re-audited on a schedule.

What you keep

The loop, the skills, the gates. All of it is yours.

Everything runs inside your cloud and your data boundaries, with SOC 2, HIPAA, and region-specific requirements scoped upfront, which is not optional in half the industries on this page. The skill files encode your team's knowledge. The state lives in your repo. The gates are your tests and your data. When the engagement ends, you own a system your engineers can read, modify, and extend. Not a subscription to ours.

That is the moat. A loop you understand is a compounding asset. A loop you rent is someone else's.

Loops pair naturally with our other engagements. Forward Deployed Engineers embed with your team to build systems like these alongside you, and Claude Enablement covers the skills, MCP servers, and evals a loop runs on.

Questions worth answering

Loop engineering FAQ.

Loop engineering is designing the system that supervises an AI agent so it can work unsupervised. The loop finds the work, hands it to an agent, verifies the result mechanically, records what happened, and decides the next move. It has five parts: automations with objective stop conditions, skill files that carry your operational knowledge, MCP connectors into your real tools, a maker/checker split so the agent never grades its own work, and persistent state version-controlled in your repo.
Only if the workload passes four conditions, and only under all four. The work recurs at least weekly. A machine can reject bad output (a test suite, a schema validator, a replay against historical data). The budget absorbs the token spend from retries and exploration. And the agent has real tools: logs, historical data, and a safe environment to run what it writes. Miss one condition and the loop costs more than it returns.
Any operation with repetitive, machine-checkable work that arrives on someone else's schedule. We build loops for insurance claims intake, CNC and industrial manufacturing telemetry, logistics carrier feeds (EDI, flat files, APIs), real estate listing syndication, financial services reconciliation, and healthcare remittance processing. The domain changes; the anatomy doesn't.
The mechanical check that rejects bad work before a human sees it. In practice it is usually replay against historical data: a candidate fix must process the historical library at or above the current benchmark and resolve the new failures, with zero regressions, or it escalates instead of shipping. A separate checker agent with fresh context verifies the gate; the agent that wrote the fix never grades its own work.
Structure, not trust. The agent that writes never grades its own work; a separate verifier with different instructions, and often a different model, checks the result against an objective gate such as replay accuracy over historical data. Every loop has hard stops: a token budget, an iteration cap, and a time limit. Security checks live in the gate itself (dependency audit, secret scanning, static analysis), permissions start read-only, and nothing deploys to production without a human gate. The loop drafts, humans approve.
Loops retry, re-read, and explore, so token spend happens whether or not a run ships anything. We model token economics at your scale during discovery, before you commit, not after the first invoice. The metric we build to is cost per accepted change, not tokens spent or tasks attempted, and we instrument it so the number is visible before the invoice is.
About 12 weeks from kickoff to a production loop your team operates. Weeks 1 to 2 are discovery: an honest four-condition assessment, the first draft of the skill file, and a token-spend model. Weeks 3 to 4 prove one reliable manual run, because scheduling an unreliable process just automates the unreliability. Weeks 5 to 8 assemble the loop. Weeks 9 to 12 harden permissions, put security checks in the gate, and hand over.
Yes, all of it. Everything runs inside your cloud and your data boundaries, with SOC 2, HIPAA, and region-specific requirements scoped upfront. The skill files, the state, the gates, and the source code are yours at handover. You own a system your engineers can read, modify, and extend. Not a subscription to ours.

Find out if your workload passes the test.

A 15-minute call is enough to run the four-condition assessment against your operation: insurance intake, machine telemetry, carrier feeds, listing syndication, reconciliation, or the recurring grind we haven't met yet. If a loop isn't worth building, we'll say so. That answer is free.

Book a call