Autoany | broomva.tech

Problem

"Make X better" is the most common request to an autonomous agent — and the most dangerous. Without a formal framework, recursive self-improvement is unbounded: the agent mutates artifacts without knowing if the result is actually better, has no rollback path when it isn't, and compounds errors across iterations.

Approach

Autoany formalizes the pattern behind autoresearch as a reusable systems primitive. The core insight: autoresearch is not "AI doing ML research." It is a bounded closed-loop optimizer over executable artifacts. That pattern generalizes to any domain.

A problem instance is a tuple:

Symbol	Name	Example
X	Artifact state space	Code, config, prompts, workflows
M	Mutation operators	LLM edits, parameter sweeps, template expansion
H	Immutable harness	Test suite, benchmark, linter
E	Execution backend	Local shell, sandbox, CI runner
J	Evaluator	Scoring function, acceptance criteria
C	Hard constraints	Budget caps, safety invariants
B	Budget policy	Max iterations, token spend, wall time
P	Promotion policy	Score threshold, regression gates
L	Ledger	Audit trail of all mutations and scores

The core law: do not grant an agent more mutation freedom than your evaluator can reliably judge.

Architecture overview

Autoany ships as three Rust crates plus integration adapters:

autoany-core — the EGRI microkernel. Defines the loop: specify mutable surface → freeze harness → propose mutation → execute under budget → score with trusted evaluator → promote / discard / branch → record lineage → repeat. All loop state is serializable for checkpoint/resume.

autoany-aios — adapter that maps EGRI loops onto Arcan agent sessions. Each iteration is an agent turn with tool calls; the evaluator runs as a separate agent with read-only access to the artifact.

autoany-lago — persistence adapter that writes every mutation, score, and promotion decision to the Lago event journal. The full lineage of an optimization run is replayable from journal events.

The loop is deliberately simple:

while budget.has_remaining() {
    let candidate = mutator.propose(&current, &context);
    let result = harness.execute(&candidate);
    let score = evaluator.score(&result, &constraints);

    if promotion_policy.should_promote(&score, &current_score) {
        current = candidate;
        ledger.record_promotion(candidate, score);
    } else {
        ledger.record_discard(candidate, score);
    }
}

Use cases

EGRI applies wherever you have a mutable artifact and a way to measure quality:

Prompt optimization — mutate system prompts, evaluate on a test suite of expected outputs
Configuration tuning — sweep parameters, score on latency/throughput/cost metrics
Code improvement — propose refactors, gate on test passage + coverage + lint score
Workflow refinement — adjust pipeline stages, evaluate on end-to-end accuracy
Model fine-tuning — select training data, score on eval benchmarks

The framework is domain-agnostic by design — the problem specification is the only thing that changes.

Current status

Active development. autoany-core has the loop runtime, checkpoint/resume, and budget enforcement. autoany-aios and autoany-lago bridge crates are in integration. Python test harness validates end-to-end loop behavior.

Why it matters

EGRI makes recursive improvement safe by construction. The evaluator is immutable, the budget is bounded, and every mutation is auditable. This is the difference between "let the agent try stuff" and "run a governed optimization loop with rollback."

Problem

Approach

A problem instance is a tuple:

Symbol

Name

Example

Artifact state space

Code, config, prompts, workflows

Mutation operators

LLM edits, parameter sweeps, template expansion

Immutable harness

Test suite, benchmark, linter

Execution backend

Local shell, sandbox, CI runner

Evaluator

Scoring function, acceptance criteria

Hard constraints

Budget caps, safety invariants

Budget policy

Max iterations, token spend, wall time

Promotion policy

Score threshold, regression gates

Ledger

Audit trail of all mutations and scores

The core law: do not grant an agent more mutation freedom than your evaluator can reliably judge.

Architecture overview

Autoany ships as three Rust crates plus integration adapters:

autoany-core — the EGRI microkernel. Defines the loop:

specify mutable surface → freeze harness → propose mutation → execute under budget → score with trusted evaluator → promote / discard / branch → record lineage → repeat

. All loop state is serializable for checkpoint/resume.

The loop is deliberately simple:

while budget.has_remaining() { let candidate = mutator.propose(&current, &context); let result = harness.execute(&candidate); let score = evaluator.score(&result, &constraints); if promotion_policy.should_promote(&score, &current_score) { current = candidate; ledger.record_promotion(candidate, score); } else { ledger.record_discard(candidate, score); } }

Use cases

EGRI applies wherever you have a mutable artifact and a way to measure quality:

Prompt optimization — mutate system prompts, evaluate on a test suite of expected outputs

Configuration tuning — sweep parameters, score on latency/throughput/cost metrics

Code improvement — propose refactors, gate on test passage + coverage + lint score

Workflow refinement — adjust pipeline stages, evaluate on end-to-end accuracy

Model fine-tuning — select training data, score on eval benchmarks

The framework is domain-agnostic by design — the problem specification is the only thing that changes.