AI Delivery Playbook

The one rule

Cheapest thing that could work → prove it → only then spend more.

We sell AI delivery at a low price point. That only works if the method is repeatable and nobody re-invents decisions on every job. Bigger model, RAG, fine-tuning, more infra — each is a cost you justify with a number on an eval scorecard, never a hunch.

The seven phases

From "what do they want" to "live & watched"

Discovery & Scoping

One-sentence problem. 5–10 example I/O pairs (these become the eval). What's out of scope.

Data & Knowledge

What knowledge does the answer need, is it stable or live, and where does it live?

Approach

Prompt-only, long-context, RAG, or fine-tune? Follow the decision tree, cheapest first.

Model Selection

Start at the cheapest tier. Climb only when the eval forces it. Judge with a stronger model.

Thin Slice

Smallest end-to-end version of one real example — with guardrails from day one.

Evaluation

Golden set + LLM-judge for correctness & groundedness. No green eval, no launch.

Deploy, Monitor & Hand-off

Smallest useful surface, every request logged, budget alarm, runbook, re-eval cadence. Maintainable by someone other than its author.