Autonomous Development Workflows: A Storyboard Guide to the Control Metalayer and bstack

Most teams stop at "AI writes code." The interesting question is what happens after — who reviews it, who tests it, who decides it is safe to merge, and who learns from the outcome.

This post is a storyboard: six scenes that walk you from an empty terminal to a repository that operates itself. Each scene builds on the last. By the end, your agents will pick up tickets, create pull requests, review their own changes, iterate on feedback, and merge — all governed by typed safety contracts and cross-session memory.

The stack is called bstack — 24 curated agent skills across 7 layers. The foundation is the agentic control kernel, a control-systems metalayer that treats your repository as a plant and your AI agents as controllers.

Scene 1: Install the stack

Everything starts with a single command.

npx skills add broomva/bstack

This installs the meta-skill. From here, bootstrap all 24 skills:

bash ~/.agents/skills/bstack/scripts/bootstrap.sh

Verify the installation:

bash ~/.agents/skills/bstack/scripts/validate.sh

You now have seven layers of capability installed:

Layer	What it gives you	First command
Foundation	Safety gates, harness commands, AGENTS.md	`bootstrap control metalayer`
Memory	Cross-session context, prompt library	`save this as a prompt`
Orchestration	Agent dispatch, self-improvement	`symphony init`
Research	Deep analysis, competitive intel	`deep research on X`
Design	Glass UI, production templates	`create an arcan-glass component`
Platform	Decision tools, content pipeline	`optimize this decision`
Strategy	Risk analysis, daily briefs, decision logs	`pre-mortem this project`

The layers compound. Each one you activate multiplies what the layers above can do.

Scene 2: Bootstrap the control metalayer

The control metalayer is what turns a repository from a passive code store into an active control surface.

The seven-layer architecture — foundation through strategy — The seven composable layers — foundation through strategy

Initialize it in any project:

python3 scripts/control_kernel_init.py . --profile governed

This installs four artifacts:

.control/policy.yaml — setpoints with explicit metrics and thresholds
schemas/ — typed state, action, trace, and evaluator JSON schemas
METALAYER.md — the control loop definition with plant/shield/estimator sections
Harness gates wired to make smoke, make check, make control-audit

The core law of the control kernel:

Do not grant an agent more mutation freedom than your evaluator can reliably judge.

In practice, this means your agents operate within typed boundaries. They emit control directives — not raw actions. A safety shield filters every proposed change before it reaches the codebase.

Plant → observe() → Runtime → update estimator → belief state
  → LLM Agent: decision(belief) → θ_t (typed directive)
  → Controller: propose(belief, θ_t) → proposed action
  → Safety Shield: filter(action, belief) → safe action + certificate
  → Plant: apply(safe action) → result
  → Evaluator/Ledger: append trace + score

The agent never touches the plant directly. Every mutation passes through the shield. Every mutation is logged.

Scene 3: The control loop in action

The control loop — observe, plan, act, verify, adapt — Observe → Plan → Act → Verify → Adapt

With the metalayer in place, your agent operates in a classical control loop applied to software development:

Observe. Read the current state — open issues, test results, metrics, recent commits. The belief state b_t includes not just plant observations but prior session context from the consciousness stack.

Plan. The LLM reasons about what change would improve the system. It outputs a typed control directive θ_t — not raw code, but a structured decision about what to do and why.

Act. A deterministic controller module translates the directive into a concrete action: create a branch, write code, open a PR. The safety shield validates the action against policy constraints before execution.

Verify. Harness gates run automatically — make smoke → make check → make test. If any gate fails, the agent diagnoses the failure and iterates.

Adapt. Results feed back into the belief state. The ledger records what happened. The next cycle starts with richer context.

This loop runs at different rates depending on the task:

Loop	Cadence	LLM involved?	What runs
Servo	milliseconds	No	Lints, type checks, deterministic gates
Execution	seconds	No	CI pipelines, test suites
Supervisory	seconds-minutes	Yes	Goal setting, mode switching, tool selection
Improvement	minutes-days	Yes	Controller synthesis, model learning

The agent handles the supervisory and improvement loops. The harness handles everything below.

Scene 4: Wire up the consciousness stack

An agent that starts blank every session will repeat solved problems, contradict prior decisions, and violate invisible constraints. The consciousness stack solves this with three memory substrates.

The consciousness stack — three substrates merging into persistent context — Three memory substrates → persistent context

Substrate 1: Control metalayer (behavioral governance)

The .control/ directory is machine-readable policy:

Setpoints = "what good looks like" (persistent goals)
Gates = "what must never happen" (crystallized lessons from past failures)
State.json = "where we are now" (live snapshot of project health)

Substrate 2: Knowledge graph (declarative memory)

The Obsidian bridge captures per-session conversation docs with full reasoning chains, wikilinks for cross-referencing, and YAML frontmatter taxonomy for navigation:

# Bridge conversation logs to Obsidian
make conversations

This creates searchable, interlinked session docs. When an agent needs context on a prior decision, it traverses the graph instead of guessing.

Substrate 3: Conversation logs (episodic memory)

Dual-source parsing of event logs and transcripts, with noise filtering, callout formatting, and incremental generation. Every session is captured, indexed, and available for future recall.

How lessons graduate

The system has a deliberate graduation path for knowledge:

Agent encounters failure → fix applied (working memory)
  → User corrects behavior → feedback saved (auto-memory)
    → Session captured with reasoning chain (conversation log)
      → Pattern recurs across sessions → documented (knowledge graph)
        → Pattern is enforceable → added as gate (.control/policy.yaml)
          → Pattern is foundational → added to invariants (CLAUDE.md)

Each level is more permanent and more expensive to change. An agent that has run 100 sessions in a project carries the accumulated wisdom of all 100 — without loading all 100 into context.

Scene 5: Orchestrate with Symphony

Single-agent control loops are powerful. Multi-agent orchestration is where the leverage compounds.

Symphony is a runtime daemon that listens to your issue tracker (Linear, GitHub) and dispatches agents per ticket. The pattern is: poll → dispatch → per-issue worker → reconcile.

Symphony — multi-agent dispatch and orchestration — Multi-agent dispatch — poll, dispatch, per-issue workers, reconcile

Initialize Symphony in your project:

symphony init

Configure WORKFLOW.md with your tracker and runtime:

tracker:
  kind: linear
  project: "MY-PROJECT"

runtime:
  kind: arcan
  base_url: "http://localhost:3000"
  policy:
    allow_capabilities: ["fs:read:**", "fs:write:**", "exec:*"]

lifecycle:
  before_run: "make smoke"
  after_run: "make check"

When a ticket lands in your project, Symphony:

Polls the tracker for new or updated issues
Dispatches a worker agent per ticket in an isolated workspace
Runs the before_run hook — if it fails, the worker aborts (fail-closed)
Renders a prompt template with ticket context and project conventions
Drives the agent through a turn loop (up to max_turns)
Runs the after_run hook for validation
Reconciles — kills stalled workers, checks states, cleans up

Each worker gets its own isolated workspace (path-contained after canonicalization). Agents cannot escape their sandbox. The approval posture is fail-closed — if anything is uncertain, the system stops.

What this looks like in practice

Over 40 tickets and pull requests handled autonomously in a single session. The agent:

Creates a PR from a Linear ticket
Reviews its own changes
Responds to CI failures, iterates until checks pass
Closes irrelevant tickets with reasoning
Merges when all gates are green

Pull requests are the natural guardrail. They give humans a review point without slowing down the autonomous loop.

Scene 6: Self-improvement with EGRI

The final scene is where the system learns to improve itself.

EGRI (Evaluator-Governed Recursive Improvement) is the framework for turning vague optimization goals into bounded, safe improvement loops. The core constraint:

The evaluator is immutable. The artifact is mutable. The evaluator determines what "better" means. The artifact is what gets better.

Define a problem spec for your controller:

# problem-spec.control.yaml
artifact: "src/controllers/ticket-handler.ts"
evaluator: "tests/eval/ticket-handler-eval.ts"
mutation_surface:
  - "prompt templates"
  - "tool selection logic"
  - "retry policy"
promotion_policy:
  metric: "ticket_resolution_rate"
  threshold: 0.85
  window: "7d"
rollback: "git revert HEAD"

EGRI — the recursive self-improvement spiral — Evaluator-governed recursive improvement

The EGRI loop:

Mutate — the agent proposes a change to the artifact within the defined mutation surface
Evaluate — the immutable evaluator scores the change against defined metrics
Promote or rollback — if the score exceeds the threshold, the change is promoted. Otherwise, it is rolled back
Log — every trial is recorded to the Lago ledger with full trace data
Repeat — the loop continues, each iteration informed by the history of prior trials

The autoany modules handle the hard parts: dead-end detection (dead_ends.rs), stagnation detection across trials (stagnation.rs), strategy distillation from history (strategy.rs), and cross-run knowledge inheritance (inheritance.rs).

Over time, the system converges. Not because the model gets smarter, but because the governance infrastructure around it accumulates knowledge, crystallizes lessons into gates, and narrows the space of possible mistakes.

The thesis

Autonomous development is not about smarter models. It is about better control surfaces.

The repository is the plant. The agent is the controller. The harness is the safety shield. The consciousness stack is the memory. Symphony is the orchestrator. EGRI is the improvement loop.

Every piece has a typed interface and an explicit safety boundary. Every mutation is logged. Every lesson graduates from working memory to permanent policy.

The stack is open and modular. Start with Scene 1 — install bstack. Each scene you complete compounds the capability of every scene that follows.

npx skills add broomva/bstack

The full skills catalog, architecture diagrams, and reference docs are at broomva.tech/skills.