These are the two choices juniors get wrong most. Make both with a flowchart and the eval — never a hunch. Cheapest option first, every upgrade justified by a number.
RAG = before answering, fetch the few most relevant chunks of the client's knowledge and put them in the prompt. Walk this tree top to bottom and stop at the first match.
prompt < long-context < RAG < fine-tune. Most business projects land on RAG. Fine-tuning is rare — it teaches behaviour, not facts, and a good RAG + prompt usually wins for far less effort.
Rate card & method are small + stable → pinned in the prompt (long-context); past projects grow and similarity matters → RAG over them; it's facts not a writing style (not fine-tune). → pinned + RAG.
sentence-transformers model (e.g. all-MiniLM-L6-v2). Runs on CPU, $0 per call, no lock-in. Good enough for small/medium KBs.Default to the Claude family and pick by tier. Verify live pricing in the Anthropic docs before quoting a client — IDs and tiers below, prices change.
| Tier | Model ID | When to use |
|---|---|---|
| cheap / fast | claude-haiku-4-5-20251001 | Our default. High-volume, well-scoped tasks: classification, extraction, grounded RAG answers, drafting. |
| balanced | claude-sonnet-4-6 | Harder reasoning, multi-step, ambiguous inputs, agent loops. Also a solid eval judge. |
| most capable | claude-opus-4-8 | Hardest reasoning / agents / edge cases — or a strict eval judge over a cheaper shipped model. |
We started on Haiku and ran the eval — it under-structured the proposals and mis-anchored estimates. Drafting a coherent, grounded proposal is real reasoning, so we climbed one tier to claude-sonnet-4-6 and judge with claude-opus-4-8. The eval, not the gut, decides.