The Pivot Point
I was the one arguing AI could not do my team's job. We were the fastest in the engagement at the work that actually moves a customer, running flat out and at capacity. Then I was told to make it work anyway.
The Terminator problem
The situation
The team was already the fastest in the engagement at the things that actually move a customer: live workarounds, hopping on calls, pulling logs and clones, running queries safely against a live instance, and writing the fix code for partners to deploy. When I was asked, repeatedly, to "add AI," I pushed back honestly. There was real work no model could touch.
Then the answer had to become yes. So the real question was never "should we use AI," it was where AI genuinely removes load and where human judgment stays non-negotiable. The rule I built everything around: AI does the heavy, repetitive triage; a person always owns the customer.
I built it as four moves, each with a human checkpoint: an AI triage engine, an intake gate that stops incomplete tickets, a small governed fleet of tools, and the economics that keep it cheap and owned by the team.
Starting state
The approach
Step 1: Sidecar, an AI triage engine with guardrails, not a black box
Sidecar is a containerized AI triage engine with MongoDB-backed persistence, encrypted secrets, and cloud-ready deployment. It integrates with Jira through the Atlassian API and routes models through our LLM proxy for hard cost control. It runs on reusable, per-brand "skills" so triage is consistent across products, with built-in guardrails and usage tracking on every call.
What it actually produces: a triaged ticket and a drafted fix recommendation, never a customer-facing action. Output quality scales with the quality of the input ticket, so an engineer reviews and validates before anything reaches a development partner. Bug-fix integration has a placeholder for a code-fixing step, with a fallback to the standard source repositories.
The team behind the build
Sidecar was built and is maintained by the team, not handed to it. Building a unit that can own work like this, and run without me in the room, is the other half of the story.
Read Build What Lasts →Step 2: Gandalf stops incomplete tickets at the door (the human-in-the-loop gate)
Most wasted engineering time starts with a ticket that is missing the intel an engineer needs to act. Gandalf is the Jira gatekeeper that checks every incoming ticket against the brand's required-intel checklist. If anything is missing, nothing gets through: it assigns the ticket to itself, sets it to waiting, and tells the support agent exactly what is missing and why it is not enough to work yet. Only once the ticket is complete does it route to the next engineer in line.
CHECKLIST
✓ Steps to reproduce
✓ Expected vs actual
✓ Environment / version
✓ Brand context
Complete ↓
Incomplete ↓
↑ back to agent
The point is not to replace the engineer; it is to guarantee that when a human picks up a ticket, it is actually workable. The gate is the AI; the decision stays with the person.
The same discipline, on the vendor side
A gate that holds the quality line is the same instinct that holds vendors to account: set the standard, enforce it on every pass, and make the exception the thing that cannot happen.
Read Hold The Line →Step 3: From one tool to a small, governed fleet
Backlog cleanup: bug-ticket housekeeping, built on Sidecar's skeleton, to keep the backlog clean automatically instead of by hand.
Sentiment & churn signals: a pair of traffic-light customer-sentiment and churn-risk scans, one for support and one for customer success (Zendesk + LiteLLM), feeding a "save vs hand-off" retention framework so the renewals team acts on risk early, before renewal rather than at it.
An AI-generated test corpus: 910 buildable application-under-test samples plus 14 demo apps across .NET, Java, and JavaScript version matrices, paired with Claude Code "attack" skills that score code-protection strength against known ground truth.
A vendor-contract-management app: replacing the spreadsheet that contract data used to live in.
01
Backlog cleanup
Bug-ticket housekeeping. Keeps the backlog clean automatically, not by hand.
02
Sentiment & churn
Traffic-light sentiment and churn-risk, for support and customer success. Feeds a save-vs-hand-off retention framework.
03
Test corpus
910 buildable test samples plus 14 demo apps. Claude Code "attack" skills score code-protection strength.
04
Vendor contracts
Replaces the spreadsheet. Contract data in an actual app: tracked, searchable, governed.
Where this fleet came from
This is another result after scaling a team's output without a single extra hire: same people, broader coverage, one operating map.
Read Clear The Fog →Step 4: Cheap, governed, and owned by the team
LiteLLM caps and routes spend, and usage tracking attributes cost to every ticket. A perfectly prepared ticket lands at around four cents; a "no-info" ticket can run up to $10 before a person ever touches it, which is exactly why the intake gate pays for itself. The quality holds because a human still signs off. Most important for a tool an embedded leader builds: the team owns and maintains the stack, so it does not walk out the door when I do.
What Sidecar does
What it does not do
The honest limit: output quality follows input quality
The honest limit: the output is only ever as good as the ticket that goes in. Sidecar makes a good reporter faster; it does not rescue a bad ticket. That is by design: the human is still the quality bar.
The same discipline, on the cost side
Hard cost control on every call is the same instinct that turned unmanaged spend into measured savings elsewhere: make the cost legible, then make it earn its place.
Read Stop The Bleed →The results
In-house tools
6
A governed fleet: triage engine, intake gate, backlog cleanup, sentiment signals, test corpus, and contracts app.
Human-in-the-loop
100%
Every fix recommendation validated by a person before it reaches a partner or customer.
Handed to a bot
0
The AI triages and drafts; a person owns every customer outcome, start to finish.
AUT test corpus
910 + 14
Buildable application-under-test samples plus complex demo apps across .NET, Java, and JavaScript.
Cost control
Capped
Spend routed and capped on every call, attributed to each ticket, so token cost never runs away.
Owned by the team
In-house
Built to be maintained by the team, not orphaned shadow IT when the leader moves on.
The AI took the heavy, repetitive load while a person stayed on every customer outcome, spend stayed capped and attributed to each ticket, and because the team owns and maintains the stack, none of it walked out the door when I did.
Black Mirror edition
What made it hard
Arguing against the AI hype while running the highest-performing team is an uncomfortable place to stand. It is easy to be read as resistant rather than careful. The credibility came from being specific about what AI could not do, then proving I would build it the moment it could actually help.
The real risk was eroding the human judgment that made the team fast. So every tool was designed to remove load, not decisions: the gate enforces intake quality, the triage drafts but never acts, and a person owns the customer end to end.
The last test of any tool an embedded leader builds is whether it survives them. Keeping cost controlled and handing ownership to the team, instead of leaving behind orphaned shadow IT, was the difference between a clever demo and something that lasts.
Want AI that removes load, not judgment?
Let's talk about building AI your team actually trusts.