Dijkstra, Coding Agents, and the Limits of Natural-Language Programming

AI coding assistants have made natural language a far more powerful front-end for software development. That practical shift does not refute Dijkstra’s critique of natural-language programming in EWD667, originally published in 1979 with the title “On the foolishness of ’natural language programming.'”

One could argue that English has become a de-facto programming language. The narrower and perhaps more defensible claim is different: current coding agents are powerful systems for natural-language-guided software construction. They are good at synthesis, modification, search, and local repair under empirical feedback. They usually do not construct programs in Dijkstra’s stronger sense: formal artifacts designed to be understood and justified through disciplined reasoning. That distinction clarifies what has changed and what has not.

Dijkstra argued that the obligation to use formal symbolism was not an arbitrary burden imposed by stubborn machines. Formal notation was part of what made serious reasoning possible in the first place. Modern AI coding assistants are impressive enough to force a re-reading of that argument. They do not, however, overturn it.

What Dijkstra Was Actually Objecting To Link to heading

It is easy to flatten Dijkstra’s position into the complaint that English is ambiguous. That is obviously true, and secondary. In EWD667, Dijkstra’s deeper claim is that formal symbolisms matter because they turn programming into a rule-governed activity. They specify the legal moves. That is what makes manipulations governable and rules out large classes of nonsense before one even begins to argue about whether a result is good.

Dijkstra’s point about “natural language programming” was not simply that natural languages are messy. Programming gains much of its leverage from not being conducted in an unconstrained medium. Formal systems make reasoning tractable because they narrow the interface between intention and execution. They do not merely encode a finished thought; they help produce one by making it harder to smuggle confusion through the process.

That argument still bites. The problem is not only that English leaves room for multiple interpretations. Informal discourse does not, by itself, provide a privileged notion of correctness. If a workflow lacks a stable formal object against which claims can be checked, ambiguity is only the surface symptom. The deeper problem is the absence of an authoritative structure that rules certain moves illegitimate.

What Current Coding Agents Actually Do Link to heading

The public documentation for mainstream coding agents describes a workflow that is far more concrete than “just ask the AI.” GitHub Copilot coding agent, for example, is presented as a background worker that can fix bugs, implement incremental features, improve test coverage, update documentation, and address technical debt within a repository. GitHub’s docs also emphasize that the agent works with repository-scoped context, opens a pull request, and then requests human review; workflow runs may require explicit approval before they execute on the proposed branch. GitHub’s best-practices guidance is equally revealing: good results depend on clear issue descriptions, acceptance criteria, directions about which files to change, and statements about whether unit tests are expected.

The same pattern appears elsewhere. Anthropic’s Claude Code documentation describes a system organized around permissions, tooling, and subagent configuration. Its permission model is defined around operations such as Read, Edit, and Bash. That is a neat summary of what these systems are in practice: codebase-navigation and code-transformation agents operating inside a controlled development environment.

That is an important shift in software practice. A capable agent can absorb a great deal of local context, generate candidate changes quickly, and iterate with far less friction than a human manually moving among editor, shell, browser, test runner, and documentation. For many tasks, the job is now to steer, inspect, and accept or reject, not to type every token by hand.

Still, none of that is the same thing as saying that natural language has become the programming medium in Dijkstra’s sense. The official workflows are built around repositories, tools, tests, diffs, review, and deployment controls. They are not built around deriving implementations from formal mathematical specifications and proving them correct. Mainstream agents usually optimize for passing tests, satisfying examples, matching surrounding code, surviving type checks and linters, and pleasing the human reviewer.

That is not a criticism masquerading as a definition. It is simply what the systems are for.

The Prompt Is Not the Program Link to heading

Once the performance of current agents becomes salient, a conceptual confusion appears almost immediately. A user gives the agent a prompt. The agent uses repository context, files, tools, and tests. A working artifact comes out. It becomes tempting to say that the whole bundle is now “the program.” That is the wrong abstraction.

A prompt is an informal steering signal. It is an instruction to a program-construction process. It can be excellent or terrible, precise or vague, but it is not itself the produced software artifact.

The surrounding loop is also not the program. Prompt plus repository context plus tools plus tests plus examples plus human review is better understood as a development process or control loop that may produce a program. It is the mechanism by which a human and an agent search over candidate artifacts, discard bad ones, and refine promising ones.

The program, in the ordinary sense, is still the resulting codebase, binary, configuration, or other executable artifact that is eventually maintained, reasoned about, deployed, and held responsible for behavior. In Dijkstra’s stronger sense, the program is not merely something that runs. It is something that can be understood as a disciplined construction, ideally one whose correctness arguments are not accidental afterthoughts.

Prompting a coding agent therefore looks less like programming in the strict Dijkstra sense and more like an informal control language for software construction. That control language is valuable. It is often surprisingly expressive. It maybe has become already central to everyday engineering. But it is not identical to the formal artifact that results from the process.

Search Under Feedback Is Not Derivation From Specification Link to heading

The dominant agent workflow today is empirical. An agent proposes a change, runs checks, looks at failures, revises the change, and repeats. The checks might include unit tests, integration tests, linters, type systems, golden examples, screenshots, benchmarks, and human inspection. In tightly instrumented settings, the loop can be remarkably effective.

That effectiveness should not be mistaken for formal derivation. When a coding agent succeeds because tests pass, the system has not usually demonstrated correctness in any strong mathematical sense. It has shown compatibility with a partial set of probes. Sometimes that is enough. Sometimes it is the only thing that is economically sensible. It is still a different epistemic situation from having a formal specification and a proof.

A familiar example makes the gap plain. An agent can update an API, satisfy its tests, and still miss the invariant that only administrators may flip a certain state transition, because the test suite never encoded that rule exhaustively. The code may be acceptable in practice after review. It has not been derived from an authoritative formal account of what the system must do.

Most mainstream agent workflows therefore lack a privileged formal notion of correctness in the first place. The test suite is not a proof. An issue description is not a specification. A few examples are not a semantics. A satisfied reviewer is not a theorem.

There is also a subtler complication. From an agent’s perspective, a formal specification is often just another artifact in the workspace unless a human has already made it authoritative. If the user says, “build what I mean,” and there is no trusted formal spec, the agent must still synthesize structure from informal human intent. Even if the workflow inserts a formal specification later, that specification must itself be created, validated, and maintained. The hard interpretive step has not vanished. It has merely moved.

One sometimes hears that AI will soon close the loop by translating informal natural language directly into formal specifications and then into verified implementations. Perhaps, in narrow domains, parts of that story will become routine. But unless the formal specification is itself grounded as the authoritative statement of intent, the workflow has only added one more generated artifact whose adequacy still depends on informal judgment.

None of this makes current agents weak. It locates their strength correctly. They are excellent at guided search in spaces where success can be tested cheaply enough, where conventions and examples carry a great deal of hidden structure, and where human reviewers can catch the remaining category errors.

Karpathy and the Workflow Shift Link to heading

Andrej Karpathy’s recent public comments on code agents are useful evidence here, but only if read carefully. The interesting part is not the headline-friendly claim that one can get away with barely typing code anymore. The interesting part is that the workflow has changed enough for a strong practitioner to spend significant time steering agents, arranging feedback loops, and working at a larger granularity than individual lines.

Karpathy’s autoresearch repository makes the point more cleanly than many abstract arguments do. The setup gives an agent a small training environment, lets it modify code, evaluates the result on a concrete validation metric, and keeps or discards changes based on that feedback. Karpathy explicitly describes the human role as editing the program.md instructions while the agent iterates on train.py under a fixed experimental budget. That is a vivid example of labor shifting away from direct code entry and toward designing, steering, and evaluating an automated search process.

But it is not a philosophical refutation of Dijkstra. If anything, it illustrates the distinction. autoresearch works precisely because there is a measurable loop with a concrete metric, a bounded surface area, and a mechanically checkable notion of local improvement. The achievement is real, but its logic is empirical optimization under feedback, not proof-oriented program construction out of natural language.

Karpathy’s examples are evidence of a workflow shift, not closure in an older philosophical dispute. They show that natural language has become a much more powerful front-end for controlling software work. They do not show that English has inherited the role of a formal programming language.

Where Dijkstra Still Wins, and Where Practice Has Changed Link to heading

Dijkstra was right about the central intellectual point. Formal programming gains power by constraining what counts as a legitimate step and by making reasoning manageable. Current coding agents do not abolish that fact. They mostly help humans work around it by making informal steering vastly more productive and by exploiting empirical checks, repository structure, and human review.

Practice has still changed, and it would be silly to deny it. For a large class of engineering work, full formal proof is neither necessary nor cost-effective. A migration script, a feature-flag rollout, a build-system fix, a user-interface refactor, or a narrow internal service can often be built responsibly with tests, examples, code review, and operational monitoring. In those settings, coding agents deliver real leverage.

What should be resisted is the conceptual slide. Empirical testing and review are often sufficient in practice, but “sufficient” is not the same as programming as disciplined formal construction. Passing tests is not the same as possessing a proof. Steering an agent with prose is not the same as replacing formal programming with English.

That leaves a cleaner conclusion. AI coding assistants do not vindicate natural-language programming in Dijkstra’s strong sense. They make natural language a more powerful control surface for software construction. That is a major change in software development. It is not the same as turning natural language into the formal medium in which programs are correctly constructed and justified.

References Link to heading

Edsger W. Dijkstra, On the foolishness of “natural language programming” (EWD667)
GitHub Docs, About GitHub Copilot coding agent
GitHub Docs, Best practices for using GitHub Copilot to work on tasks
Claude Code Docs, Claude Code settings
No Priors, Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI
Andrej Karpathy, karpathy/autoresearch