Most teams stop at "AI writes code." The interesting question is what happens after — who reviews it, who tests it, who decides it is safe to merge, and who learns from the outcome.
This post is a storyboard: six scenes that walk you from an empty terminal to a repository that operates itself. Each scene builds on the last. By the end, your agents will pick up tickets, create pull requests, review their own changes, iterate on feedback, and merge — all governed by typed safety contracts and cross-session memory.
The stack is called bstack — 24 curated agent skills across 7 layers. The foundation is the agentic control kernel, a control-systems metalayer that treats your repository as a plant and your AI agents as controllers.
Scene 1: Install the stack
Everything starts with a single command.
npx skills add broomva/bstack
This installs the meta-skill. From here, bootstrap all 24 skills:
bash ~/.agents/skills/bstack/scripts/bootstrap.sh
Verify the installation:
bash ~/.agents/skills/bstack/scripts/validate.sh
You now have seven layers of capability installed:
| Layer | What it gives you | First command |
|-------|------------------|---------------|
| Foundation | Safety gates, harness commands, AGENTS.md | bootstrap control metalayer |
| Memory | Cross-session context, prompt library | save this as a prompt |
| Orchestration | Agent dispatch, self-improvement | symphony init |
| Research | Deep analysis, competitive intel | deep research on X |
| Design | Glass UI, production templates | create an arcan-glass component |
| Platform | Decision tools, content pipeline | optimize this decision |
| Strategy | Risk analysis, daily briefs, decision logs | pre-mortem this project |
The layers compound. Each one you activate multiplies what the layers above can do.
Scene 2: Bootstrap the control metalayer
The control metalayer is what turns a repository from a passive code store into an active control surface.
Initialize it in any project:
python3 scripts/control_kernel_init.py . --profile governed
This installs four artifacts:
.control/policy.yaml— setpoints with explicit metrics and thresholdsschemas/— typed state, action, trace, and evaluator JSON schemasMETALAYER.md— the control loop definition with plant/shield/estimator sections- Harness gates wired to
make smoke,make check,make control-audit
The core law of the control kernel:
Do not grant an agent more mutation freedom than your evaluator can reliably judge.
In practice, this means your agents operate within typed boundaries. They emit control directives — not raw actions. A safety shield filters every proposed change before it reaches the codebase.
Plant → observe() → Runtime → update estimator → belief state
→ LLM Agent: decision(belief) → θ_t (typed directive)
→ Controller: propose(belief, θ_t) → proposed action
→ Safety Shield: filter(action, belief) → safe action + certificate
→ Plant: apply(safe action) → result
→ Evaluator/Ledger: append trace + score
The agent never touches the plant directly. Every mutation passes through the shield. Every mutation is logged.
Scene 3: The control loop in action
With the metalayer in place, your agent operates in a classical control loop applied to software development:
Observe. Read the current state — open issues, test results, metrics, recent commits. The belief state b_t includes not just plant observations but prior session context from the consciousness stack.
Plan. The LLM reasons about what change would improve the system. It outputs a typed control directive θ_t — not raw code, but a structured decision about what to do and why.
Act. A deterministic controller module translates the directive into a concrete action: create a branch, write code, open a PR. The safety shield validates the action against policy constraints before execution.
Verify. Harness gates run automatically — make smoke → make check → make test. If any gate fails, the agent diagnoses the failure and iterates.
Adapt. Results feed back into the belief state. The ledger records what happened. The next cycle starts with richer context.
This loop runs at different rates depending on the task:
| Loop | Cadence | LLM involved? | What runs | |------|---------|---------------|-----------| | Servo | milliseconds | No | Lints, type checks, deterministic gates | | Execution | seconds | No | CI pipelines, test suites | | Supervisory | seconds-minutes | Yes | Goal setting, mode switching, tool selection | | Improvement | minutes-days | Yes | Controller synthesis, model learning |
The agent handles the supervisory and improvement loops. The harness handles everything below.
Scene 4: Wire up the consciousness stack
An agent that starts blank every session will repeat solved problems, contradict prior decisions, and violate invisible constraints. The consciousness stack solves this with three memory substrates.
Substrate 1: Control metalayer (behavioral governance)
The .control/ directory is machine-readable policy:
- Setpoints = "what good looks like" (persistent goals)
- Gates = "what must never happen" (crystallized lessons from past failures)
- State.json = "where we are now" (live snapshot of project health)
Substrate 2: Knowledge graph (declarative memory)
The Obsidian bridge captures per-session conversation docs with full reasoning chains, wikilinks for cross-referencing, and YAML frontmatter taxonomy for navigation:
# Bridge conversation logs to Obsidian
make conversations
This creates searchable, interlinked session docs. When an agent needs context on a prior decision, it traverses the graph instead of guessing.
Substrate 3: Conversation logs (episodic memory)
Dual-source parsing of event logs and transcripts, with noise filtering, callout formatting, and incremental generation. Every session is captured, indexed, and available for future recall.
How lessons graduate
The system has a deliberate graduation path for knowledge:
Agent encounters failure → fix applied (working memory)
→ User corrects behavior → feedback saved (auto-memory)
→ Session captured with reasoning chain (conversation log)
→ Pattern recurs across sessions → documented (knowledge graph)
→ Pattern is enforceable → added as gate (.control/policy.yaml)
→ Pattern is foundational → added to invariants (CLAUDE.md)
Each level is more permanent and more expensive to change. An agent that has run 100 sessions in a project carries the accumulated wisdom of all 100 — without loading all 100 into context.
Scene 5: Orchestrate with Symphony
Single-agent control loops are powerful. Multi-agent orchestration is where the leverage compounds.
Symphony is a runtime daemon that listens to your issue tracker (Linear, GitHub) and dispatches agents per ticket. The pattern is: poll → dispatch → per-issue worker → reconcile.
Initialize Symphony in your project:
symphony init
Configure WORKFLOW.md with your tracker and runtime:
tracker:
kind: linear
project: "MY-PROJECT"
runtime:
kind: arcan
base_url: "http://localhost:3000"
policy:
allow_capabilities: ["fs:read:**", "fs:write:**", "exec:*"]
lifecycle:
before_run: "make smoke"
after_run: "make check"
When a ticket lands in your project, Symphony:
- Polls the tracker for new or updated issues
- Dispatches a worker agent per ticket in an isolated workspace
- Runs the
before_runhook — if it fails, the worker aborts (fail-closed) - Renders a prompt template with ticket context and project conventions
- Drives the agent through a turn loop (up to
max_turns) - Runs the
after_runhook for validation - Reconciles — kills stalled workers, checks states, cleans up
Each worker gets its own isolated workspace (path-contained after canonicalization). Agents cannot escape their sandbox. The approval posture is fail-closed — if anything is uncertain, the system stops.
What this looks like in practice
Over 40 tickets and pull requests handled autonomously in a single session. The agent:
- Creates a PR from a Linear ticket
- Reviews its own changes
- Responds to CI failures, iterates until checks pass
- Closes irrelevant tickets with reasoning
- Merges when all gates are green
Pull requests are the natural guardrail. They give humans a review point without slowing down the autonomous loop.
Scene 6: Self-improvement with EGRI
The final scene is where the system learns to improve itself.
EGRI (Evaluator-Governed Recursive Improvement) is the framework for turning vague optimization goals into bounded, safe improvement loops. The core constraint:
The evaluator is immutable. The artifact is mutable. The evaluator determines what "better" means. The artifact is what gets better.
Define a problem spec for your controller:
# problem-spec.control.yaml
artifact: "src/controllers/ticket-handler.ts"
evaluator: "tests/eval/ticket-handler-eval.ts"
mutation_surface:
- "prompt templates"
- "tool selection logic"
- "retry policy"
promotion_policy:
metric: "ticket_resolution_rate"
threshold: 0.85
window: "7d"
rollback: "git revert HEAD"
The EGRI loop:
- Mutate — the agent proposes a change to the artifact within the defined mutation surface
- Evaluate — the immutable evaluator scores the change against defined metrics
- Promote or rollback — if the score exceeds the threshold, the change is promoted. Otherwise, it is rolled back
- Log — every trial is recorded to the Lago ledger with full trace data
- Repeat — the loop continues, each iteration informed by the history of prior trials
The autoany modules handle the hard parts: dead-end detection (dead_ends.rs), stagnation detection across trials (stagnation.rs), strategy distillation from history (strategy.rs), and cross-run knowledge inheritance (inheritance.rs).
Over time, the system converges. Not because the model gets smarter, but because the governance infrastructure around it accumulates knowledge, crystallizes lessons into gates, and narrows the space of possible mistakes.
The thesis
Autonomous development is not about smarter models. It is about better control surfaces.
The repository is the plant. The agent is the controller. The harness is the safety shield. The consciousness stack is the memory. Symphony is the orchestrator. EGRI is the improvement loop.
Every piece has a typed interface and an explicit safety boundary. Every mutation is logged. Every lesson graduates from working memory to permanent policy.
The stack is open and modular. Start with Scene 1 — install bstack. Each scene you complete compounds the capability of every scene that follows.
npx skills add broomva/bstack
The full skills catalog, architecture diagrams, and reference docs are at broomva.tech/skills.