
The gap between "AI writes code" and "AI ships features autonomously" is not a model problem. It is an infrastructure problem.
A better model does not fix sessions that die when your laptop sleeps. It does not create guardrails that prevent an agent from deleting your production branch. It does not evaluate which of four competing pull requests best satisfies a ticket's acceptance criteria.
This guide covers how to build autonomous development workflows that actually run — persistent sessions, parallel agents, remote interfaces, automated PR loops, and the governance layer that keeps it all from going sideways.
What you need
- Claude Code installed and authenticated
- A git repository you want to automate
- tmux for persistent sessions
- A task tracker with an API (Linear, GitHub Issues, etc.)
Optional but powerful:
- Discord or Telegram for remote agent control
- A VPS or always-on machine for continuous operation
Part 1: Persistent sessions with tmux
The first problem to solve is session persistence. AI coding sessions are long-running — a complex feature might take 30 minutes of autonomous work. If your SSH connection drops, your terminal closes, or your machine sleeps, the session dies and all context is lost.
tmux solves this. It is a terminal multiplexer that keeps sessions alive in the background, independent of your connection.
Basic setup
# Install tmux
brew install tmux # macOS
sudo apt install tmux # Ubuntu/Debian
# Create a named session for your project
tmux new-session -d -s myproject
# Attach to it
tmux attach -t myproject
Running Claude in a persistent session
The pattern is: create a tmux session, start Claude Code inside it, detach, and reconnect whenever you want.
# Create a session and start Claude in your project directory
tmux new-session -d -s feature-auth -c ~/projects/myapp
tmux send-keys -t feature-auth 'claude' Enter
# Send a prompt to the running session
tmux send-keys -t feature-auth \
'Research the auth module and create a PR that adds OAuth2 support' Enter
# Detach — the session keeps running
# Reattach later from any terminal, any device via SSH
tmux attach -t feature-auth
Multiple parallel sessions
This is where it gets interesting. You can run multiple Claude sessions simultaneously, each working on a different task:
# Session 1: Working on authentication
tmux new-session -d -s auth -c ~/projects/myapp
tmux send-keys -t auth 'claude' Enter
tmux send-keys -t auth 'Implement OAuth2 provider support per ticket AUTH-42' Enter
# Session 2: Working on billing
tmux new-session -d -s billing -c ~/projects/myapp
tmux send-keys -t billing 'claude' Enter
tmux send-keys -t billing 'Add Stripe subscription tiers per ticket BILL-17' Enter
# Session 3: Watchdog reviewing the other sessions' output
tmux new-session -d -s reviewer -c ~/projects/myapp
tmux send-keys -t reviewer 'claude' Enter
tmux send-keys -t reviewer 'Monitor open PRs and provide code review feedback' Enter
# List all running sessions
tmux list-sessions
Each session operates in its own git worktree or branch, producing independent pull requests. The sessions survive SSH disconnects, machine sleep, and network interruptions.
Useful tmux commands
tmux list-sessions # See all active sessions
tmux attach -t <name> # Reconnect to a session
tmux kill-session -t <name> # Stop a session
tmux send-keys -t <name> 'text' Enter # Send input to a session
Part 2: Remote interfaces — control agents from anywhere
Running sessions on a server is useful, but you also want to interact with them from your phone, a different computer, or while away from your desk. Claude Code supports chat platform integrations out of the box.
Discord as a project control surface
Each Discord channel maps to a project folder. You interact with Claude through messages, and it runs commands in the corresponding tmux session on your server.
The setup uses Claude Code's built-in channel support:
# In your Claude Code session, configure the Discord integration
# This creates a bot that bridges Discord messages to your terminal
claude channels setup discord
Once configured, each channel becomes a persistent Claude session. You can:
- Send prompts from your phone
- Receive PR links and status updates
- Create new task threads within channels
- Monitor multiple projects from a single Discord server
Telegram for mobile-first control
Same pattern, different interface — Telegram works well for quick commands and status checks:
claude channels setup telegram
The key insight is that these are not chat bots generating responses. They are interfaces to full Claude Code sessions running on your server, with access to your filesystem, git repos, and all installed tools.
Part 3: From tickets to pull requests — the autonomous loop
A single persistent session is useful. The real power comes when you wire it into a complete loop: ticket intake, research, implementation, testing, PR creation, code review, and iteration.
The autonomous engineer prompt
The core prompt that drives autonomous work follows this structure:
You are an autonomous senior engineer working on this project.
1. Read the ticket details and all linked references
2. Research the current codebase — understand the dependency chain
3. Implement the solution following project conventions
4. Run all checks: lint, type-check, test
5. Create a pull request with a clear description
6. Address any CI failures or review feedback
7. Continue until the PR is mergeable
Definition of done: PR passes all CI checks, follows the control
layer definitions, and includes tests for new functionality.
This prompt compounds with the project's existing governance. If the repo has a CLAUDE.md with conventions, a .control/policy.yaml with gates, and pre-commit hooks, the agent inherits all of it automatically. You do not need to repeat "remember to lint" in every prompt because the harness enforces it.
Parallel agents on different tickets
When you have a backlog of independent tickets, you can dispatch agents in parallel using git worktrees:
# Create isolated worktrees for parallel work
git worktree add ../myapp-auth feature/auth
git worktree add ../myapp-billing feature/billing
git worktree add ../myapp-ui feature/ui-refresh
# Launch Claude in each worktree
tmux new-session -d -s auth -c ../myapp-auth
tmux send-keys -t auth 'claude "Implement AUTH-42: OAuth2 support"' Enter
tmux new-session -d -s billing -c ../myapp-billing
tmux send-keys -t billing 'claude "Implement BILL-17: subscription tiers"' Enter
tmux new-session -d -s ui -c ../myapp-ui
tmux send-keys -t ui 'claude "Implement UI-8: refresh dashboard layout"' Enter
Each agent works in complete isolation — different branches, different directories, no file conflicts. Each produces its own PR.
Part 4: The feedback loop — automated PR review and iteration
Creating a PR is only half the loop. The other half is getting feedback and iterating until the PR meets quality standards.
What a good feedback loop looks like
Agent creates PR
→ CI runs (lint, type-check, tests, build)
→ Pre-commit hooks enforce formatting and secrets scanning
→ Another agent reviews the diff for code quality
→ Review comments posted on the PR
→ Original agent reads comments and pushes fixes
→ Loop continues until all checks pass
This is not hypothetical. Claude Code hooks make it concrete.
Hooks — the governance layer
Claude Code hooks are shell scripts that fire on specific events. They are the enforcement mechanism that makes autonomous work safe.
Here is a real hooks configuration:
{
"hooks": {
"Stop": [
{
"hooks": [{
"type": "command",
"command": "scripts/conversation-bridge-hook.sh",
"timeout": 5
}]
}
],
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "scripts/control-gate-hook.sh",
"timeout": 5
},
{
"type": "command",
"command": "scripts/regression-gate-hook.sh",
"timeout": 10
}
]
},
{
"matcher": "Write",
"hooks": [{
"type": "command",
"command": "scripts/control-gate-hook.sh",
"timeout": 5
}]
}
]
}
}
Three hooks, three purposes:
-
Stop hook — After every session, the conversation bridge captures what happened (tools called, files changed, decisions made) and writes it to
docs/conversations/. This is episodic memory — the next agent session can read what the last one did. -
PreToolUse safety gate — Before any Bash command, Write, or Edit executes, the control gate checks it against
policy.yaml. If the command matches a blocked pattern (force push, destructive reset, rm -rf on protected paths), the hook returns{"decision": "block"}and the agent never executes it. -
PreToolUse regression gate — Before any
git commit, the regression gate maps staged files against a feature test map and prompts the agent to run E2E tests on affected areas before proceeding.
The safety gate in detail
The control gate reads rules dynamically from your project's policy file:
# .control/policy.yaml
gates:
hard:
- id: G1
rule: "No force push to main/master"
pattern: "git push.*--force.*(main|master)"
- id: G2
rule: "No destructive git reset"
pattern: "git reset --hard"
- id: G3
rule: "No rm -rf on home/root directories"
pattern: "rm -rf\\s+(~/|/Users/|/)"
- id: G4
rule: "No writing to secrets files"
pattern: "(\\.env|credentials|secrets)"
The hook script parses these rules and blocks matching commands before they execute:
#!/bin/bash
# control-gate-hook.sh — reads policy.yaml, blocks dangerous commands
EVENT=$(cat)
COMMAND=$(echo "$EVENT" | python3 -c "
import sys,json
d=json.load(sys.stdin)
print(d.get('tool_input',{}).get('command',''))
" 2>/dev/null)
POLICY="$(git rev-parse --show-toplevel)/.control/policy.yaml"
RESULT=$(echo "$COMMAND" | python3 -c "
import yaml, re, sys
with open('$POLICY') as f:
policy = yaml.safe_load(f)
cmd = sys.stdin.read().strip()
for gate in policy.get('gates', {}).get('hard', []):
pattern = gate.get('pattern', '')
if pattern and re.search(pattern, cmd):
print(f'{gate[\"id\"]}|{gate[\"rule\"]}')
break
")
if [ -n "$RESULT" ]; then
GATE_ID=$(echo "$RESULT" | cut -d'|' -f1)
REASON=$(echo "$RESULT" | cut -d'|' -f2-)
echo "{\"decision\": \"block\", \"reason\": \"Safety shield $GATE_ID: $REASON\"}"
fi
This is the critical insight: governance scales better than prompts. You do not tell the agent "please don't force push" in every prompt. You enforce it structurally, and it applies to every agent, every session, forever.
The regression gate
Before every commit, the regression gate maps staged files to feature areas and suggests which E2E tests to run:
{
"features": {
"auth": {
"patterns": ["app/(auth)/**", "app/api/auth/**", "lib/auth*"],
"scenarios": [
"Navigate to /. Click sign-in. Verify OAuth options render.",
"Access /dashboard without login. Verify redirect to sign-in."
]
},
"billing": {
"patterns": ["app/api/billing/**", "lib/stripe*"],
"scenarios": [
"Navigate to /pricing. Verify Free and Pro tiers display.",
"Click upgrade. Verify Stripe checkout loads."
]
}
}
}
When you commit changes to auth files, the hook returns an "ask" decision prompting the agent to run those specific scenarios before the commit proceeds. A 10-minute bypass stamp (make regression-stamp) lets you skip the gate after tests pass.
Part 5: The control metalayer — why the harness matters more than the model
Every concept above — hooks, gates, regression maps, conversation bridges — is part of a unified governance framework called the control metalayer. It treats the repository as a control system:
| Control concept | Repository equivalent |
|---|---|
| Plant | The codebase (what changes) |
| Sensors | Tests, lints, type checks, CI |
| Actuators | Git commits, PRs, deployments |
| Controller | The AI agent |
| Safety shield | Policy gates, pre-commit hooks |
| Feedback loop | PR reviews, monitoring, alerting |
| Setpoints | Quality targets in policy.yaml |
Setpoints — measurable targets
The policy file defines measurable targets the system should maintain:
setpoints:
- id: S1
name: "gate_pass_rate"
target: 0.85
alert_below: 0.70
measurement: "smoke + check + test pass rate"
severity: blocking
- id: S10
name: "skills_installed"
target: 27
alert_below: 27
measurement: "count of installed agent skills"
severity: blocking
- id: S12
name: "hooks_wired"
target: 3
alert_below: 3
measurement: "pre-commit + Stop hook + PreToolUse gate"
severity: blocking
- id: S13
name: "bridge_freshness_seconds"
target: 120
alert_above: 3600
measurement: "seconds since last conversation bridge run"
severity: informational
These are not aspirational. They are enforced. make control-audit checks every setpoint and reports compliance:
$ make control-audit
=== Control Metalayer Audit ===
1. Policy
[ok] .control/policy.yaml
2. Governance docs
[ok] METALAYER.md
[ok] CLAUDE.md
[ok] AGENTS.md
3. Schemas
[ok] schemas/ (symlink)
[ok] 5 JSON schemas
4. Conversation bridge
[ok] hook script
[ok] bridge script
[ok] conversations MOC
5. Vault symlinks
[ok] vault conversations dir
=== Audit complete ===
Progressive crystallization — how the system improves itself
The most powerful property of this architecture is that it self-improves. Here is the graduation path for knowledge:
Session 1: Agent hits a CORS error. Fixes it with a proxy pattern.
→ Captured in conversation log (automatic, via Stop hook)
Session 2: Another agent reads the prior session. Adds a rule to AGENTS.md:
"Never call API endpoints directly from the browser."
Session 3: Rule confirmed recurring. Promoted to .control/policy.yaml
as a hard gate. Now enforced structurally — no agent can violate it.
| Layer | Where | Lifespan |
|---|---|---|
| Working memory | Context window | Single session |
| Auto-memory | ~/.claude/memory/ |
Cross-session |
| Conversation logs | docs/conversations/ |
Permanent |
| Working rules | AGENTS.md |
Until superseded |
| Enforceable gates | .control/policy.yaml |
Until policy changes |
| Invariants | CLAUDE.md |
Foundational |
Knowledge only graduates when it earns its place. A one-time fix stays in conversation logs. A recurring pattern becomes a rule. A critical rule becomes a gate.
Part 6: Evaluating competing outputs
When you run parallel agents on the same task with different prompts, you end up with multiple PRs solving the same problem. How do you pick the best one?
The answer is not to compare them manually. Build evaluation into the loop:
Automated quality signals
Every PR should automatically generate:
- Line count delta — Did the PR add 500 lines or 5,000 for the same feature?
- Test coverage — Does it include tests? What is the coverage delta?
- Lint and type-check results — Clean pass or suppressed warnings?
- Build output size — Did the bundle size increase significantly?
- Load time metrics — If you have a staging environment, measure first-render time
- Memory footprint — Track RSS before and after for backend services
Embedding criteria in tickets
The most effective pattern is to define passing criteria in the ticket itself, so the agent knows what "done" looks like before it starts:
## Acceptance Criteria
- [ ] OAuth2 flow completes for Google and GitHub providers
- [ ] Existing session tokens remain valid (no migration required)
- [ ] Test coverage for auth module >= 80%
- [ ] No new dependencies > 50KB gzipped
- [ ] Load time for /login page < 2s on 3G throttle
When multiple agents produce competing PRs, each PR can be scored against these criteria automatically. The one that passes the most criteria wins — or the human reviewer uses the scores to make a fast decision instead of reading every diff line by line.
Part 7: Bootstrapping a new project
Everything described above — hooks, gates, conversation bridges, regression maps, policy files — can be installed in a new project in minutes.
Using the agentic control kernel
The control kernel is the bootstrap layer. It installs the governance structure into any repository:
# Initialize the control metalayer in your project
python3 scripts/control_kernel_init.py ~/projects/myapp \
--profile governed \
--runtime arcan \
--ledger lago
This creates:
.control/policy.yaml— Setpoints and gatesschemas/— Typed interfaces for state, actions, tracesMETALAYER.md— Control loop documentation- Harness gates wired to
make smoke,make check,make control-audit
Using bstack
bstack is a collection of 27 agent skills across 7 layers that provide the full autonomous development stack:
# Check current installation status
bstack status
# Install all skills and wire the control harness
bstack bootstrap
Bootstrap installs skills, verifies governance files, sets up git hooks, wires the conversation bridge, configures the regression gate, and runs the control audit.
The skill layers:
| Layer | What it provides |
|---|---|
| Foundation | Safety gates, governance policy, harness engineering |
| Memory | Conversation bridge, knowledge graph, prompt library |
| Orchestration | Multi-agent dispatch, project scaffolding, self-improvement loops |
| Research | Multi-source research, skills inventory |
| Design | Design systems, production templates |
| Platform | Content pipeline, SEO, brand assets |
| Strategy | Pre-mortem analysis, decision logs, weekly retrospectives |
After bootstrap, verify everything is wired:
# Full health check
make bstack-check
# Verify the control metalayer
make control-audit
# Check conversation bridge freshness
ls -la ~/.cache/broomva-bridge-stamp
# Preview regression test coverage for staged changes
make regression-map
Part 8: Infrastructure reality
Autonomous agents are compute-hungry. A single Claude Code session uses meaningful CPU and memory. Running three to five in parallel on the same machine requires planning.
What to expect
- RAM: Each tmux + Claude session uses 200-500MB depending on context. Five parallel sessions means 1-2.5GB just for the agent processes, plus whatever your build tools need.
- CPU: Agents spend most time waiting for API responses, but build/test steps can spike all cores. Stagger heavy builds across sessions.
- Disk: Git worktrees are cheap (they share the object store), but
node_modulesper worktree adds up. Use a package manager with hoisting (pnpm, bun) to share dependencies. - Network: Each agent session makes API calls. Rate limits apply. If you hit them, the agent will retry, but concurrent sessions can amplify the problem.
Recommended setup
For serious autonomous development:
- Local: Mac with 32GB+ RAM, or a Linux workstation. Run sessions directly.
- Remote: A VPS (4+ vCPU, 16GB+ RAM) with tmux. SSH in from anywhere, or use Discord/Telegram integration for mobile access.
- Hybrid: Local for interactive work, remote for overnight batch processing of ticket backlogs.
Putting it together
The complete flow:
Ticket created in Linear/GitHub
→ Agent reads ticket (via Linear MCP or GitHub API)
→ Creates a git worktree and branch
→ Researches codebase, implements solution
→ Pre-commit hooks enforce formatting and secrets scanning
→ Control gate blocks dangerous commands (policy.yaml)
→ Regression gate prompts E2E tests on affected features
→ Agent creates PR with description and test results
→ CI runs, review agent posts feedback
→ Agent addresses feedback, pushes fixes
→ Loop continues until PR is mergeable
→ Conversation bridge captures session to knowledge graph
→ Next session reads prior sessions, inherits learned patterns
→ Rules that recur get promoted to AGENTS.md
→ Critical rules get promoted to .control/policy.yaml
→ System improves with every session
This is not a demo. It is an operating mode. The quality of the autonomy is determined by the quality of the control system — not the quality of the model.
Invest in the harness.