AI
Automation
Actually Useful AI.
Custom agents, intelligent workflows, and the connective tissue between LLMs and your real business systems.
The Problem
Most AI work is theater. A homepage chatbot that answers FAQs slightly better than a search box. An 'AI assistant' nobody opens after week two. A 40-page vendor report on automation potential with no working code attached. The tools are genuinely powerful — but only when they're pointed at the right problem. The work that actually pays for itself is unglamorous: drafting outreach, classifying support tickets, summarizing calls, generating first-pass documents. Boring, repetitive tasks that cost real money when humans do them.
My Approach
I start with a workflow audit — shadow your team for a week, find the three best automation candidates, and build a pilot for exactly one. The pilot runs in production with a human-in-the-loop approval step so your team stays in control. I measure actual hours saved against actual build cost. If the ROI is there, we scale to the next workflow. If it isn't, I tell you honestly and we stop. No enterprise AI theater. No six-month implementation projects.
This is right for you if…
- Your ops team runs the same workflow — drafting, classifying, summarizing — more than 50 times a week
- Your sales team spends more than two hours a day on manual outreach that follows a predictable pattern
- You have a support queue where the first response is almost always the same, written fresh every time
- You want to test AI ROI with a real pilot before committing to a larger build
- You have an existing LLM integration that isn't reliable in production and needs proper engineering
Not the right lane if…
- 'We want to add AI to our product' without a specific workflow problem — we need a clear target first
- Consumer-facing AI features without a safety and eval strategy (I'll build this, but it needs a different scope)
- Real-time AI inference requiring sub-100ms latency — different architecture, different engagement
Honest fit assessment is part of every first conversation. If this isn't the right service, I'll tell you which one is — or refer you out entirely.
The Process
Workflow Audit
Week 1I shadow your team, map every candidate workflow, and rank them by automation potential and ROI. You get a written assessment, not a slide deck.
Pilot Spec
Week 1–2We pick one workflow. I define the inputs, outputs, success criteria, cost ceiling, and the human review checkpoints before writing a line of code.
Pilot Build
Weeks 2–5Production-grade pipeline: structured prompts, output validation, retry logic, observability, and a review UI your team can actually use.
Measure & Decide
Weeks 5–6We run the pilot with real work. I calculate actual hours saved, actual cost per task, and actual error rate. The numbers decide what happens next.
Scale
Weeks 6–10If the pilot proves out, we expand to the next workflow using the same architecture. If it doesn't, we stop and I tell you exactly why.
What to Expect
a typical first pilot
ROI in production
on most pipelines
on every first deploy
Deliverables
Every engagement ends with a clean handoff — not just working code. You should be able to own and extend what I build without a developer dependency.
- Workflow audit reportWritten assessment of automation candidates, ranked by ROI. No slide decks.
- Agent pipelineProduction-grade: structured prompts, output validation, retries, evals, and observability.
- Human-in-the-loop UIApproval queues, edit-before-send, override controls. Your team stays in control at every step.
- Cost ceiling & monitoringHard per-request budget caps, weekly cost reports, and alerts before you hit anything unexpected.
- Eval suiteA test set of representative inputs with expected outputs. You can re-run it any time you change a prompt.
- ROI reportI measure actual hours saved against actual build cost. If the math doesn't work, you hear it from me first.
Stack & Tooling
Relevant Work
MarketStar
LinkedIn lead enrichment pipeline: Apollo → email validation → profile scrape → AI summary. Per-lead research down from 45 min to 3 min — 480 hrs/yr recovered.
View Case Study →Riviera Social Club
WhatsApp AI assistant reading a live Google Doc knowledge base. Handles 87% of inbound questions with sub-30s response time — zero new tools for staff.
View Case Study →The AI pipeline alone saves us 20+ hours every week. Genuinely remarkable work — and it actually sounds like us, not a robot.
FAQ
Do I need to understand AI to work with you?
No. My job is to translate between AI capabilities and your business problem. You describe the workflow in plain English. I tell you whether it's a good automation candidate and how I'd build it reliably.
How do I know the AI output will be accurate?
We build an eval suite before shipping — test sets of representative inputs with expected outputs. Every pipeline has structured output validation, a human review step for high-stakes decisions, and logging so you can audit every result.
What if the pilot doesn't show ROI?
We stop, and I tell you why in writing. I'd rather lose the Scale engagement than build something that doesn't pay for itself. An honest assessment is part of the deliverable — not a consolation prize.
Can the AI be trained on our company voice or data?
Yes. For tone and voice, I use few-shot prompting and chain-of-thought techniques that produce output that sounds like your team. For domain knowledge, I use RAG — retrieval from your actual documents — rather than fine-tuning, which is usually overkill and significantly more expensive.
What about data privacy and security?
Every pipeline is architected with data minimisation in mind — we pass only what's necessary to the model. I can work with self-hosted models (Ollama, local inference) for sensitive data, or implement data-at-rest encryption for stored outputs. Exact controls depend on your compliance requirements.
Do you handle AI integrations into existing products?
Yes. If you have an existing product and want to add an AI feature — copilot, autocomplete, document summarisation — I can scope and build the integration. The same pilot-first approach applies: one feature, production-grade, measured before expanding.