Every collective noun for animals captures something true about how they move and behave together. A murder of crows isn't just a group — it names the coordinated, intelligent behavior that makes them formidable as a collective. We chose maturity deliberately.
Armature's agents don't just coordinate to complete a task. After every run, the system collects execution traces, runs the DiagnosticAnalyzer against them, and uses the SpecRefiner to rewrite the YAML sections that underperformed. The next run is better. Your workflow doesn't just run. It matures.
“A murder of crows is more dangerous than one.
A maturity of agents is smarter than before.”
Armature isn't a run-once tool. It's a loop. Each step feeds the next, and the next run is smarter than the last.
Roles aren't a label — they determine execution order, context access, and contribution to the self-improvement health score. A well-designed maturity has all three.
The information foundation. Researchers query tools, read context, search external sources, and build the knowledge base that downstream agents draw from. They run first — and in parallel when independent.
The production engine. Workers synthesize research into drafts, summaries, reports, code, or structured data. They consume upstream researcher output and produce the artifacts that judges and downstream workers will evaluate.
The quality gate. Judges score output quality, validate against criteria, flag hallucinations, and decide whether a result meets the bar. Only judges contribute to the quorum score in the IHR — they are the accountability layer.
AWS AgentCore, LangGraph, and CrewAI let you build agent workflows. Armature does that too — and then automatically improves them across runs using the Improvement Health Rating loop.
Prediction-verification closes the loop: SpecRefiner declares what it expects each rewrite to fix. The subsequent run confirms whether the fixes held — and which ones missed. The rewriter improves its own judgment over time.
Add --auto-improve to any run. When IHR drops below 0.75, Armature automatically calls SpecRefiner after execution — rewriting prompts, relaxing schemas, rebalancing model tiers, or tuning retry limits. Safe changes apply immediately; structural rewrites stage to {spec}.pending.yaml for human review.
Armature isn't invented from first principles — it's a synthesis of the best current academic thinking on agent harness design, published between February and May 2026, plus Microsoft's Agent Governance Toolkit, ActiveGraph's event-sourced execution model, and Veldt Labs' KYA trust layer. Every source contributed concrete, implemented capabilities.
Mature has two meanings here. The agents grow smarter every run — and the harness itself matures alongside the field, tracking the latest research as it ships.
The core finding shared across all seven: the harness is more important than the model. Armature ships the harness — production-grade, self-improving, and open source.
Write a YAML spec, point Armature at it, and watch your maturity of agents get to work.
Armature is free, MIT licensed, and built in the open. Fork it, extend it, build on it. Contributions welcome — especially new role types, tool integrations, and self-improvement strategies.