AI March 20, 2023 3 min read

Prompt engineering: the patterns that actually matter in practice

A grounded overview of the prompt techniques that produce reliable results, and the ones that sound sophisticated but do not hold up in production.

When GPT-4 arrived, the term "prompt engineer" appeared almost immediately. Some people treated it as a serious discipline, others dismissed it as a temporary workaround until models get smarter. My view is somewhere in between: most of what gets called prompt engineering is not a skill that needs specialised expertise, but a handful of concrete patterns do make a real difference and are worth understanding if you are building products with LLMs.

Here is what I actually use, and what I do not.

The patterns that reliably help

Be explicit about format. The single most effective intervention in most prompts is specifying exactly what output format you want. "Return a JSON object with these fields" or "respond in bullet points with no more than five items" consistently produces better results than leaving it implicit. Models default to verbose prose. Specifying format removes ambiguity.

Give the model a role when it matters. "You are a senior software engineer reviewing code for correctness and security" outperforms "review this code." The role provides a consistent frame for what "good" means. This is not magic - it works because the training data contains examples of role-appropriate behaviour. The effect is most pronounced for specialised domains.

Show examples for structured tasks. For tasks that require consistent output structure - classification, extraction, summarisation with specific constraints - one or two examples in the prompt ("few-shot prompting") dramatically reduces inconsistency. This is more reliable than long instructions that describe the format abstractly.

Chain steps for complex tasks. For multi-step reasoning tasks, breaking the task into explicit steps and asking the model to work through them one at a time produces more reliable results than asking for the final answer directly. This is the intuition behind "chain of thought" prompting. It is not always necessary, but for complex reasoning it is a meaningful improvement.

The patterns that are overhyped

Excessive flattery and motivation. Some prompt guides suggest starting with "You are an expert in..." or "Think step by step, this is very important." The effect is mostly marginal. A clear task description and explicit format requirements matter far more.

Extremely long system prompts with many rules. There is a ceiling to how many instructions a model reliably follows simultaneously. A prompt with twenty specific rules will see several of them ignored. Fewer, clearer constraints work better than comprehensive rule lists.

Treating prompts as fixed. The most effective use of prompt engineering is iterative: build a test set of representative inputs, measure outputs against it, and adjust the prompt based on where it fails. Treating the prompt as something you write once and never revise means you are not actually engineering anything.

A practical approach

When I start building an LLM feature, I follow a simple sequence: write the simplest possible prompt, run it against ten to twenty representative examples, identify the three most common failure modes, then add targeted instructions to address those specific failures. Repeat until the failure rate is acceptable.

The evaluation set is the most important investment. Without it, you are guessing.

Back to all posts

Contact

The patterns that reliably help

The patterns that are overhyped

A practical approach

If this resonated, write to me. I reply personally.