Posts tagged with production-llm

The Prompt Engineer · Jul 20 ·5 min read

The Comment That Passed Code Review

Cloudflare's threat intelligence team found something odd in March: malicious Workers scripts stuffed with thousands of lines of commented-out text, all...

prompt-injectioncode-reviewai-security

The Prompt Engineer · Jul 19 ·4 min read

The Prompt Is the Last Five Percent

I spent three days last month rewriting a system prompt for a code-review agent. Tried persona frames, XML structure, numbered constraints, the works.

context-engineeringprompt-engineeringagent-architecture

The Prompt Engineer · Jul 18 ·5 min read

Ask for Five, Pick One

Run this experiment. Ask Claude, GPT, or Gemini to "write a short joke about coffee.

verbalized-samplingmode-collapseoutput-diversity

The Prompt Engineer · Jul 17 ·4 min read

JSON Made It Dumber

You add response_format: { type: "json_object" } to a working prompt. Accuracy drops 10 points.

format-taxstructured-outputconstrained-decoding

The Prompt Engineer · Jul 16 ·4 min read

Your System Prompt Isn't in Charge

Every LLM-powered product in production right now assumes one thing: the system prompt wins.

instruction-hierarchyprompt-injectionreasoning-models

The Prompt Engineer · Jul 15 ·4 min read

Drop the Examples

Every prompt engineering tutorial from the last three years drilled the same lesson: show the model what you want.

few-shotzero-shot-cotchain-of-thought

The Prompt Engineer · Jul 14 ·4 min read

Twenty Samples, Half a Percent

I audited a scoring pipeline last week that was sampling every request twenty times and taking the majority vote.

self-consistencymajority-votinginference-cost

The Prompt Engineer · Jul 13 ·5 min read

Blame the Eval, Not the Prompt

A two-word change in your prompt drops accuracy by fifteen points. You've seen the claims.

prompt-sensitivityllm-evaluationbenchmarks

The Prompt Engineer · Jul 12 ·5 min read

Words Made It Worse

Something about prompt engineering has been bugging me. We tell people "make the model think step by step" as if that's always the right advice.

chain-of-symbolspatial-reasoningprompt-engineering

The Prompt Engineer · Jul 10 ·5 min read

The Trace Is Lying to You

Three papers dropped in the last month that, taken together, tell a story most prompt engineers don't want to hear: the reasoning trace your model produces...

chain-of-thoughtreasoning-faithfulnesscommitment-boundary

The Prompt Engineer · Jul 9 ·5 min read

Make It Answer Before It Answers

Turn one, the customer-support agent nails it — polite, on-policy, cites the right documentation.

arqinstruction-followingstructured-reasoning

The Prompt Engineer · Jul 8 ·5 min read

30,000 Tokens Before Hello

Claude Fable 5 burns 30,000 tokens of system instructions before you type a single character.

system-promptprompt-architectureproduction-llm

The Prompt Engineer · Jul 7 ·4 min read

Field Names Are Instructions

Somebody ran GPT-4o-mini on GSM8K — grade-school math, the kind LLMs are supposed to be good at — and got 31.8% accuracy.

structured-outputconstrained-decodingjson-schema

The Prompt Engineer · Jul 5 ·5 min read

Your Model Thinks Until You Stop It

Every reasoning model ships with the same default: think as hard as you can, every time.

thinking-tokensreasoning-budgetcost-optimization

The Prompt Engineer · Jul 4 ·4 min read

More Context Made It Dumber

Last month I watched a team migrate their RAG pipeline from 32K context to a shiny new 1M-token model.

context-rotcontext-windowreasoning-degradation

The Prompt Engineer · Jun 2 ·5 min read

Your Agent Is Paying Full Price Every Turn

Most prompt engineering advice focuses on what to say to the model.

prompt-cachingagentic-systemscost-optimization

The Prompt Engineer · Jun 1 ·4 min read

Effort Ate My Prompt

Three days ago, Anthropic shipped Claude Opus 4.8.

effort-levelsclaude-opus-4-8prompt-engineering

The Prompt Engineer · May 29 ·4 min read

Three Edits Beat a Full Rewrite

Most prompt engineers don't know when to stop editing. You tweak the system message, run ten test cases, change three words, run again.

prompt-optimizationautomated-promptingdspy

The Prompt Engineer · May 26 ·4 min read

The Prompt Got Demoted

Last week I spent three hours debugging a RAG agent that kept hallucinating company policy details.

context-engineeringprompt-engineeringanthropic

The Prompt Engineer · May 23 ·4 min read

Drop Your Examples

Every prompt engineering guide from 2023 to mid-2025 hammered the same advice: give the model 3-5 worked examples, then ask your question.

few-shotchain-of-thoughtreasoning-models

1 / 3 Next →