A repeatable method for our delivery team: the seven phases every AI engagement runs through, the exact questions we answer at each one, and a worked mini-project — from the client's brief to picking the model, deciding RAG-or-not, and proving it works.
We sell AI delivery at a low price point. That only works if the method is repeatable and nobody re-invents decisions on every job. Bigger model, RAG, fine-tuning, more infra — each is a cost you justify with a number on an eval scorecard, never a hunch.
One-sentence problem. 5–10 example I/O pairs (these become the eval). What's out of scope.
What knowledge does the answer need, is it stable or live, and where does it live?
Prompt-only, long-context, RAG, or fine-tune? Follow the decision tree, cheapest first.
Start at the cheapest tier. Climb only when the eval forces it. Judge with a stronger model.
Smallest end-to-end version of one real example — with guardrails from day one.
Golden set + LLM-judge for correctness & groundedness. No green eval, no launch.
Smallest useful surface, every request logged, budget alarm, runbook, re-eval cadence. Maintainable by someone other than its author.