Cost to Implement vs Cost to Verify
The Wrong Scoreboard
The discourse on coding agents has been obsessing for the past year over the wrong question. The main focus has been what models can do: lines written, autonomous minutes, benchmark scores, model cards, percent of lines shipped by AI. These are all generalized measures of implementation throughput. Useful for a bird's-eye view of model progress, but they say almost nothing about where the actual bottlenecks now live. The operative question for practitioners in 2026 is not what tools can do, it's what you should ask them to do.
