AI & Automation Human-in-the-Loop

The Pivot Point

I was the one arguing AI could not do my team's job. We were the fastest in the engagement at the work that actually moves a customer, running flat out and at capacity. Then I was told to make it work anyway.

Work with me Request my CV

The Terminator problem

The situation

The team was already the fastest in the engagement at the things that actually move a customer: live workarounds, hopping on calls, pulling logs and clones, running queries safely against a live instance, and writing the fix code for partners to deploy. When I was asked, repeatedly, to "add AI," I pushed back honestly. There was real work no model could touch.

Then the answer had to become yes. So the real question was never "should we use AI," it was where AI genuinely removes load and where human judgment stays non-negotiable. The rule I built everything around: AI does the heavy, repetitive triage; a person always owns the customer.

I built it as four moves, each with a human checkpoint: an AI triage engine, an intake gate that stops incomplete tickets, a small governed fleet of tools, and the economics that keep it cheap and owned by the team.

Starting state

Team performance Fastest in org

Spare capacity None

The mandate Add AI

Customer ownership Human only

The approach

Step 1: Sidecar, an AI triage engine with guardrails, not a black box

Sidecar is a containerized AI triage engine with MongoDB-backed persistence, encrypted secrets, and cloud-ready deployment. It integrates with Jira through the Atlassian API and routes models through our LLM proxy for hard cost control. It runs on reusable, per-brand "skills" so triage is consistent across products, with built-in guardrails and usage tracking on every call.

Jira ticket

Sidecar · containerized, cloud-ready

APIAtlassian integration

SDKAnthropic SDK

GRDBrain · guardrails

SKLProduct skills

LLMProxy LLM · hard cost control

usage tracking · encrypted secrets · MongoDB

Triaged ticket Fix recommendation never customer-facing output scales w/ input quality

Engineer validates → development retests · improves · approves

What it actually produces: a triaged ticket and a drafted fix recommendation, never a customer-facing action. Output quality scales with the quality of the input ticket, so an engineer reviews and validates before anything reaches a development partner. Bug-fix integration has a placeholder for a code-fixing step, with a fallback to the standard source repositories.

The team behind the build

Sidecar was built and is maintained by the team, not handed to it. Building a unit that can own work like this, and run without me in the room, is the other half of the story.

Read Build What Lasts →

Step 2: Gandalf stops incomplete tickets at the door (the human-in-the-loop gate)

Most wasted engineering time starts with a ticket that is missing the intel an engineer needs to act. Gandalf is the Jira gatekeeper that checks every incoming ticket against the brand's required-intel checklist. If anything is missing, nothing gets through: it assigns the ticket to itself, sets it to waiting, and tells the support agent exactly what is missing and why it is not enough to work yet. Only once the ticket is complete does it route to the next engineer in line.

New ticket

Gandalf checklist check

CHECKLIST

✓ Steps to reproduce

✓ Expected vs actual

✓ Environment / version

✓ Brand context

Complete ↓

Routes to next engineer in line workable on arrival

Human engineer decision stays here

Incomplete ↓

Assigned to self Status: Waiting Support agent told what's missing

↑ back to agent

The point is not to replace the engineer; it is to guarantee that when a human picks up a ticket, it is actually workable. The gate is the AI; the decision stays with the person.

The same discipline, on the vendor side

A gate that holds the quality line is the same instinct that holds vendors to account: set the standard, enforce it on every pass, and make the exception the thing that cannot happen.

Read Hold The Line →

Step 3: From one tool to a small, governed fleet

Backlog cleanup: bug-ticket housekeeping, built on Sidecar's skeleton, to keep the backlog clean automatically instead of by hand.

Sentiment & churn signals: a pair of traffic-light customer-sentiment and churn-risk scans, one for support and one for customer success (Zendesk + LiteLLM), feeding a "save vs hand-off" retention framework so the renewals team acts on risk early, before renewal rather than at it.

An AI-generated test corpus: 910 buildable application-under-test samples plus 14 demo apps across .NET, Java, and JavaScript version matrices, paired with Claude Code "attack" skills that score code-protection strength against known ground truth.

A vendor-contract-management app: replacing the spreadsheet that contract data used to live in.

Backlog cleanup

Bug-ticket housekeeping. Keeps the backlog clean automatically, not by hand.

Sentiment & churn

Traffic-light sentiment and churn-risk, for support and customer success. Feeds a save-vs-hand-off retention framework.

Test corpus

910 buildable test samples plus 14 demo apps. Claude Code "attack" skills score code-protection strength.

Vendor contracts

Replaces the spreadsheet. Contract data in an actual app: tracked, searchable, governed.

Where this fleet came from

This is another result after scaling a team's output without a single extra hire: same people, broader coverage, one operating map.

Read Clear The Fog →

Step 4: Cheap, governed, and owned by the team

LiteLLM caps and routes spend, and usage tracking attributes cost to every ticket. A perfectly prepared ticket lands at around four cents; a "no-info" ticket can run up to $10 before a person ever touches it, which is exactly why the intake gate pays for itself. The quality holds because a human still signs off. Most important for a tool an embedded leader builds: the team owns and maintains the stack, so it does not walk out the door when I do.

What Sidecar does

Makes a good reporter faster

Triages and drafts fix recommendations

Scales throughput at ~$0.04 / ticket

Attributes cost per ticket, every call

Team owns and maintains the stack

What it does not do

Rescue a bad ticket

Act without human validation

Replace the engineer's judgment

Touch the customer directly

Walk out the door with the leader

The honest limit: output quality follows input quality

The honest limit: the output is only ever as good as the ticket that goes in. Sidecar makes a good reporter faster; it does not rescue a bad ticket. That is by design: the human is still the quality bar.

The same discipline, on the cost side

Hard cost control on every call is the same instinct that turned unmanaged spend into measured savings elsewhere: make the cost legible, then make it earn its place.

Read Stop The Bleed →

The results

In-house tools

A governed fleet: triage engine, intake gate, backlog cleanup, sentiment signals, test corpus, and contracts app.

Human-in-the-loop

100%

Every fix recommendation validated by a person before it reaches a partner or customer.

Handed to a bot

The AI triages and drafts; a person owns every customer outcome, start to finish.

AUT test corpus

910 + 14

Buildable application-under-test samples plus complex demo apps across .NET, Java, and JavaScript.

Cost control

Capped

Spend routed and capped on every call, attributed to each ticket, so token cost never runs away.

Owned by the team

In-house

Built to be maintained by the team, not orphaned shadow IT when the leader moves on.

The AI took the heavy, repetitive load while a person stayed on every customer outcome, spend stayed capped and attributed to each ticket, and because the team owns and maintains the stack, none of it walked out the door when I did.

Black Mirror edition

What made it hard

Arguing against the AI hype while running the highest-performing team is an uncomfortable place to stand. It is easy to be read as resistant rather than careful. The credibility came from being specific about what AI could not do, then proving I would build it the moment it could actually help.

The real risk was eroding the human judgment that made the team fast. So every tool was designed to remove load, not decisions: the gate enforces intake quality, the triage drafts but never acts, and a person owns the customer end to end.

The last test of any tool an embedded leader builds is whether it survives them. Keeping cost controlled and handing ownership to the team, instead of leaving behind orphaned shadow IT, was the difference between a clever demo and something that lasts.

Want AI that removes load, not judgment?

Let's talk about building AI your team actually trusts.

Get in Touch Request my CV See other work