In Development · Early Access

Submit a goal.
Ship production-ready code.

AnvilWorks is a stateful coding execution engine — CLI-first, scriptable, and chainable into any automation pipeline. From a single bug fix to a multi-month application build.

Point it at a repository, give it a natural-language goal, and get back committed, tested, production-ready code. Use it at your terminal, in a script, or from anything that can POST to an HTTP endpoint.

Request Early Access
Two Operating Modes

One engine. Any scale.

Single-job mode
Quick work, committed fast.

Bug fixes, enhancements, refactors. Typically minutes. AnvilWorks runs the work, validates it, and commits to git. No babysitting required.

terminal● done
$ anvilworks run -p /srv/api \
  "Add rate limiting to login endpoint"
◌ Clarifying acceptance criteria...
◌ Implementing...
✓ Tests passed (14/14)
✓ Committed: a4f7c9d · feat/rate-limit
Project mode
Full application builds.

Submit a PRD. AnvilWorks generates a plan, executes phases in dependency order, commits each one, and pauses at approval gates you configure. Days to weeks, unattended.

anvilworks / project plan● active
Phase 1 · Foundation
Phase 2 · Data Layer
Phase 3 · API Surface
Phase 4 · Frontend
Phase 5 · Deploy
The Methodology

Why the output is trustworthy.

Every task — regardless of size — passes through the same five stages. AnvilWorks produces code that passes its own validation, or tells you exactly why it couldn't.

01
Clarify
Generates concrete, testable success criteria from the natural-language goal before writing a single line of code.
02
Implement
AI coding agent executes against the real codebase with full filesystem access — no sandboxing, no stubs allowed.
03
Auto-validate
Tests run. Structure checked. Banned patterns (mocks, stubs, placeholder TODOs) flagged and rejected.
04
Quality gate
LLM evaluates the actual implementation against the success criteria generated in step one. Not vibes — criteria.
05
Repair
Up to three targeted fix iterations if validation fails. Surfaces the specific failure with full context if it can't self-correct.

No silent mocks. No stubs that pass CI and break in production. AnvilWorks either produces code that passes its own validation — or surfaces the specific failure with full context for human review.

Capabilities

What it manages so you don't have to.

Stateful memory
Rolling context file per project. Every job knows what came before — prior decisions, patterns, implementations.
Failure recovery
Classifies failures as retryable, fixable, blocked, or escalation-worthy. Acts accordingly without prompting.
Human inbox
Pauses and creates an escalation item only when it genuinely can't proceed. Resumes automatically on reply.
Git integration
Auto-branches, auto-commits per completed job. PR creation on command. Configurable per project.
Model agnostic
Any OpenAI-compatible endpoint — Ollama, Claude, hosted providers. Swap without touching your workflow.
Live streaming
SSE endpoint streams job output in real time. Watch it work or pipe output to your own tooling.
REST API

API-ready for pipeline integration.

The CLI wraps a local REST daemon running on port 7821. Every command is also an HTTP endpoint — callable from internal tools, scripts, CI pipelines, or other AI systems.

POST
/jobs
Submit a single goal against a project path
GET
/jobs/{id}/stream
SSE stream of live job output
POST
/projects
Submit a PRD — get a multi-phase execution plan
GET
/projects/{id}/plan
Full phase and task breakdown with status
POST
/inbox/{id}/reply
Answer an escalation and resume execution
POST
/projects/{id}/pr
Create a pull request for completed work
# ~/.anvilworks/config.toml
port = 7821 # daemon listens here
model = "claude-sonnet-4-6" # any OpenAI-compatible endpoint
auto_commit = true # git commit after each job
In Development

Not yet generally available.

AnvilWorks currently powers Steward, ElfTech's autonomous product ownership system. Commercial access is coming. Tell us about your use case and we'll reach out when access opens.