Rocket Simulation Meets Recursive Improvement: 145 Trials, Zero Promotions, and What That Actually Proves

We turned OpenRocket into a headless simulator, ran 145 EGRI trials in 5 minutes, swept 5 motor configurations, and measured exactly why mutation surface matters more than trial count. Complete data and analysis included.

March 19, 2026

14 min read·
simulationoptimizationEGRIrocketryautoany

We ran 145 optimization trials in 297 seconds and got zero promotions. Then we changed one variable the optimizer wasn't allowed to touch and got a 531% altitude gain. That contrast is the entire lesson.

We took OpenRocket — an open-source model rocket simulator — stripped out the GUI, and wired it into an EGRI (Evaluator-Governed Recursive Improvement) loop. The pipeline: a headless CLI that outputs structured JSON, a Python harness that runs grid sweeps with constraint checking, a JSONL ledger that records every trial, and a formal problem spec that defines the optimization contract. We then expanded the mutation surface across 5 motor configurations, 15 rocket designs, and extended parameter ranges to measure exactly where the altitude sensitivity lives.

The Stack

Layer Tool Role
Evaluator OpenRocket core (Java 17) 6DOF physics simulation, 2.06s avg per trial
CLI rocket-sim Headless interface: run, info, sweep, events
Harness EGRI loop (Python) Grid sweep, constraint checking, JSONL ledger, auto-promotion
Problem spec problem-spec.yaml Formal EGRI contract with objective, constraints, budget
Skill openrocket-sim Reusable agent skill for future sessions

Why Simulation Is the Ideal EGRI Evaluator

EGRI's core law: never grant more mutation freedom than your evaluator can reliably judge.

A physics simulation satisfies this perfectly:

  • Deterministic — same inputs always produce the same outputs (we verified: baseline altitude = 50.538m across runs)
  • Fast — 2.06s average per trial, enabling 145 trials in under 5 minutes
  • Trusted — the evaluator is 6DOF physics, not a heuristic or LLM judge
  • Structured — outputs are typed scalars: altitude, velocity, Mach, acceleration, flight time, ground hit velocity
  • Constraint-checkable — hard limits enforced per trial, violations logged to ledger

This means we can safely run in auto-promote mode — the strongest autonomy level in EGRI. No human gate needed. The evaluator is the gate.

The Rocket Under Test

The subject: "A simple model rocket" — a single-stage design with an Estes A8-3 motor.

Component Tree:
├── Nose cone    — Ogive, 100mm, 13g
├── Body tube    — 300mm × 26mm OD, 15g
│   ├── Parachute      — 300mm diameter
│   ├── Shock cord     — 1g
│   ├── Wadding        — 2g
│   ├── Launch lug
│   ├── Trapezoidal fins (×3) — 30mm span, 6g
│   ├── Centering rings (×2)
│   └── Inner Tube (motor mount) — 75mm, A8-3 motor
└── Total dry mass: ~46g

Flight timeline (A8-3 motor, sea level, no wind):

Event Time
Launch / Motor ignition 0.000s
Lift-off 0.105s
Launch rod clearance 0.247s
Motor burnout 0.730s
Apogee (50.5m) 3.467s
Ejection charge 3.730s
Parachute deployment 3.731s
Ground hit (4.7 m/s) 15.870s

The motor burns for just 0.73 seconds. Everything after that is ballistic coast + parachute descent. This is a 16-second flight where the first 730 milliseconds determine everything.

What We Built

rocket-sim CLI

Four commands, all outputting structured JSON:

# Inspect rocket component tree + motor configurations
rocket-sim info "A simple model rocket.ork"
# → 5 flight configs: [A8-3], [B4-4], [C6-3], [C6-5], [C6-7]

# Run simulation with specific motor config
rocket-sim run "A simple model rocket.ork" 0    # A8-3 motor
rocket-sim run "A simple model rocket.ork" 1    # B4-4 motor

# Parameter sweep (single variable, N data points)
rocket-sim sweep "A simple model rocket.ork" wind_speed 0,2,5,8,10

# Flight event timeline
rocket-sim events "A simple model rocket.ork"

The CLI wraps OpenRocket's headless core module — a pure Java 17 simulation engine with zero GUI dependency. OpenRocketCore.initialize() bootstraps in one call; each Simulation.simulate() runs the full 6DOF solver.

EGRI Problem Spec

The optimization was formalized as a complete EGRI contract:

name: "rocket-optimization"

objective:
  metric: max_altitude_m
  direction: maximize
  secondary_metrics: [ground_hit_velocity_ms, max_velocity_ms, flight_time_s, max_mach]

constraints:
  - "ground_hit_velocity_ms <= 10.0"   # Safe recovery
  - "max_mach < 1.0"                   # Subsonic only
  - "flight_time_s > 5.0"              # Minimum flight time
  - "runtime_s <= 30"                  # Per-trial timeout

artifacts:
  mutable: artifacts/current_params.json    # Launch parameters
  immutable: rocket-tools.jar, .ork files   # Evaluator + designs

promotion:
  policy: keep_if_improves
  threshold: 0.5  # Must improve altitude by >0.5m

autonomy:
  mode: auto-promote
  escalation_triggers:
    - constraint_violation_detected
    - 10_consecutive_trials_no_improvement
    - budget_90_percent_exhausted

budget:
  max_trials: 100
  time_per_trial_s: 30
  total_time_s: 3600

Key design decisions:

  • Auto-promote mode because the evaluator is deterministic physics — no risk of gaming
  • 0.5m promotion threshold to filter noise (the stochastic wind model adds ~0.1m variance)
  • Escalation triggers so the loop knows when to ask for help expanding the mutation surface

Phase 1: The Grid Sweep (145 Trials)

We swept 4 launch parameters across 144 combinations plus 1 baseline:

Parameter Values Count
Rod length 0.5, 1.0, 1.5, 2.0 m 4
Rod angle 0, 5, 10 deg 3
Launch altitude 0, 500, 1000, 2000 m ASL 4
Wind speed 0, 2, 5 m/s 3
Total 4 × 3 × 4 × 3 144

Execution Metrics

Metric Value
Wall clock time 297.2s (4.95 min)
Total simulation time 299.2s
Average per trial 2.063s
Throughput 0.49 trials/sec
Promotions 0
Constraint violations 0
Ledger format JSONL, 145 entries

Statistical Summary (All 145 Trials)

Metric Min Max Mean Std Dev Range
max_altitude_m 50.408 50.853 50.664 0.086 0.445
max_velocity_ms 29.181 29.293 29.244 0.022 0.112
max_acceleration_ms2 143.617 143.698 143.667 0.015 0.081
max_mach 0.086 0.087 0.086 0.0004 0.001
flight_time_s 15.856 15.964 15.918 0.021 0.108
ground_hit_velocity_ms 4.414 4.841 4.632 0.081 0.427
runtime_s 1.915 2.473 2.063 0.090 0.558

The altitude range across all 144 candidates: 0.445 meters. Standard deviation: 8.6 centimeters. The promotion threshold was 0.5m. No candidate ever exceeded it.

144 grid trials altitude distribution — all data points within a 0.445m band, promotion threshold unreached

Parameter Sensitivity (Altitude Delta per Variable)

Parameter Best Value Worst Value Delta
Launch altitude 1000m ASL 2000m ASL 0.068m
Rod length 2.0m 0.5m 0.060m
Wind speed 0 m/s 5 m/s 0.027m
Rod angle 10° 5° 0.010m

Every parameter's sensitivity is measured in millimeters. The dominant variable — rod length at 60mm — is barely distinguishable from simulation noise. Wind speed at 27mm effect is within the stochastic variance band.

The grid did its job. It exhaustively proved that launch parameters are a dead mutation surface for this rocket. That is a measurement, not a failure.

Phase 2: Expanded Sweeps (What the EGRI Loop Told Us to Do)

The null result triggered the escalation logic: 10+ consecutive trials without improvement. Following EGRI protocol, the next move is clear — expand the mutation surface. We did this in three steps: push existing parameters to wider ranges, test motor configurations, and benchmark across rocket designs.

Individual Parameter Sweeps (Extended Range)

Wind speed (0–10 m/s, 11 data points) — the first parameter that actually moves the needle when pushed to extremes:

Wind (m/s) Altitude (m) Ground Hit (m/s) Delta from Calm
0 51.14 4.16 —
1 50.96 4.33 -0.18
2 50.65 4.67 -0.49
3 50.30 5.40 -0.84
4 49.65 5.40 -1.49
5 48.68 6.69 -2.46
6 47.29 7.35 -3.85
7 47.58 9.12 -3.56
8 46.53 9.26 -4.61
9 46.04 8.91 -5.10
10 45.70 9.82 -5.44

At 10 m/s wind, altitude drops 10.6% and ground hit velocity approaches the 10 m/s safety constraint. The relationship is non-linear — the degradation accelerates past 4 m/s.

Wind speed sensitivity — altitude degrades non-linearly while ground hit velocity approaches constraint

Rod angle (0–30°, 9 data points) — strong effect at extreme angles:

Angle (°) Altitude (m) Delta Velocity (m/s)
0 50.71 — 29.25
2 50.57 -0.14 29.29
5 49.87 -0.84 29.28
10 48.22 -2.49 29.37
15 45.73 -4.98 29.48
20 42.68 -8.03 29.67
25 38.98 -11.73 29.89
30 35.37 -15.34 30.18

At 30°, altitude drops 30% while max velocity actually increases — the rocket accelerates more on the tilted rod but wastes energy on horizontal flight. This shows why the original grid (0–10°) was too conservative to observe the effect.

Rod angle vs max altitude — 30-degree angle wastes 30% of altitude on horizontal flight

Launch altitude (0–4000m ASL, 9 data points) — thinner air = higher altitude:

Altitude ASL (m) Apogee (m) Delta Ground Hit (m/s)
0 50.65 — 4.47
500 50.99 +0.34 4.68
1000 51.17 +0.52 4.87
1500 51.63 +0.98 4.82
2000 51.79 +1.14 5.01
3000 52.47 +1.82 5.35
4000 52.93 +2.28 5.45

4000m ASL (roughly La Paz, Bolivia elevation) gains 2.28m — the only parameter that consistently improves altitude. Lower air density = less drag. But the effect is still dwarfed by motor selection.

The Motor Configuration Test (The Real Mutation Surface)

The simple rocket has 5 pre-configured motor options. We ran all 5 with identical launch parameters:

Motor Altitude (m) Velocity (m/s) Mach Flight Time (s) Ground Hit (m/s) vs. A8-3
A8-3 50.7 29.2 0.086 15.9 4.60 baseline
B4-4 135.1 53.2 0.157 37.9 4.66 +167%
C6-3 278.7 95.3 0.281 72.6 4.67 +450%
C6-5 316.5 95.3 0.281 83.5 4.48 +525%
C6-7 320.0 95.3 0.281 84.5 4.51 +531%

This is the table that proves the thesis. Switching from A8-3 to C6-7 yields a 269.3-meter gain — 605 times larger than the entire 0.445m range produced by sweeping all 144 launch parameter combinations. All five configurations pass every constraint: ground hit velocity stays under 5 m/s, Mach stays well subsonic.

The C6-3 vs. C6-5 vs. C6-7 comparison is also instructive: same motor impulse, different ejection delay. C6-7 (7-second delay) gains 41.3m over C6-3 (3-second delay) purely from better delay timing — the parachute deploys closer to apogee instead of during the coast phase.

Motor configuration comparison — same rocket, same parameters, only motor changed, showing 531% altitude gain

Cross-Design Benchmark (15 Rockets)

We ran every example .ork file in the OpenRocket distribution:

Rocket Design Altitude (m) Velocity (m/s) Mach Flight Time (s) Hit (m/s)
Pods (winglets) 31.9 24.7 0.073 10.3 4.99
Simple model (A8-3) 50.5 29.2 0.087 15.9 4.69
Clustered motors 57.9 32.4 0.095 13.8 6.33
Base drag hack 98.2 47.1 0.139 27.2 4.56
Three stage LP 270.6 78.5 0.232 72.6 4.50
Tube fin 282.5 119.4 0.351 75.6 4.43
Chute release 307.7 72.1 0.213 51.8 5.19
Pods (powered) 342.3 92.4 0.273 82.0 4.90
ARC payload 452.0 150.3 0.442 110.2 4.61
Dual parachute 593.1 135.4 0.398 65.1 4.72
Two stage HP 676.3 158.9 0.469 64.5 6.23
Parallel booster 1121.2 210.9 0.623 224.5 5.54
Airstart timing 1318.5 186.4 0.550 93.3 5.29
Sim extensions 2466.3 240.8 0.715 169.9 7.26
Sim scripting 2464.7 240.8 0.715 169.6 7.16

A 77× altitude range from 31.9m to 2466.3m — all from the same physics engine, same weather model, same evaluation pipeline. The only variables: rocket geometry, staging, motor selection, and recovery systems.

Cross-design benchmark — 15 rockets from 31.9m to 2466.3m, 77x altitude range

Wind Sensitivity by Design Class

Wind affects complex rockets far more than simple ones:

Rocket Calm (m) 10 m/s wind (m) Loss Loss %
Simple model (A8-3) 51.1 45.7 -5.4 -10.6%
Three stage LP 277.2 203.0 -74.2 -26.8%
Two stage HP 679.6 642.9 -36.7 -5.4%

The three-stage rocket loses 74 meters to 10 m/s wind — 26.8% of its calm-air altitude. The two-stage high-power design is more wind-resistant at only 5.4% loss, likely due to higher velocity and shorter coast time. This is exactly the kind of cross-design insight an EGRI loop can surface when given the right mutation surface.

The Insight: Mutation Surface Determines Everything

Here is what the data proves, with measured numbers:

Mutation Surface Range Effect on Altitude
Launch parameters (what we swept) 0.445m 0.88% of baseline
Extended wind (0→10 m/s) 5.44m 10.6% of baseline
Extended rod angle (0→30°) 15.34m 30.3% of baseline
Motor selection (A8→C6-7) 269.3m 531% of baseline
Different rocket design (pods→sim ext.) 2434.4m 7634% of lowest

The 144-trial grid sweep exhaustively proved that launch parameters are a dead mutation surface for a simple model rocket. That null result is not a failure — it is the evaluator telling you the truth. Zero promotions means the optimizer has converged, and convergence at baseline means the mutable artifact does not contain the dominant variable.

This is the general principle: in any EGRI loop, the first question is not "how many trials?" but "what are we allowed to change?" If the mutable artifact does not contain the dominant variable, the loop will converge immediately to a local optimum that looks identical to the baseline. Running more trials on the wrong surface is pure waste.

Mutation surface impact — log scale showing motor selection produces 605x more altitude range than launch parameters

EGRI Anatomy of This Run

For anyone implementing EGRI in their own domain, here is how each component mapped:

EGRI Component This Implementation
Mutable artifact artifacts/current_params.json — 4 launch parameters
Immutable evaluator rocket-tools.jar + OpenRocket physics engine
Objective Maximize max_altitude_m
Constraints ground_hit ≤ 10 m/s, Mach < 1.0, flight_time > 5s
Promotion policy keep_if_improves with 0.5m threshold
Search strategy grid_then_llm (Phase 1: exhaustive grid, Phase 2: LLM-guided)
Autonomy mode auto-promote (deterministic evaluator = no human gate)
Ledger JSONL, 145 entries, every trial with full metrics + timestamps
Budget 100 trials max, 30s/trial timeout, 3600s total
Escalation Triggered at 10 consecutive no-improvement trials

The escalation trigger fired at trial 10 and kept firing. In a fully autonomous system, this would have triggered mutation surface expansion automatically — adding motor selection to the mutable artifact set. We did this manually in Phase 2 to measure the effect cleanly.

What This Means for EGRI Generally

The rocket optimization is a clean proof of three EGRI principles:

1. A null result from a trusted evaluator is informative, not failed. Zero promotions across 144 trials proves that launch parameters are dominated. This is a measurement, not an error. The evaluator earned our trust by being deterministic physics — we believe the null result because we believe Newtonian mechanics.

2. The evaluator should outlive the mutation surface. We expanded from launch parameters to motor configs to cross-design benchmarks, all using the same rocket-sim evaluator. The evaluator infrastructure (CLI, JSON output, constraint checker, JSONL ledger) amortizes across every expansion. Build the evaluator first, then widen the search.

3. Grid search is the right first move when the search space is small. 4 × 3 × 4 × 3 = 144 combinations took 5 minutes and exhaustively covered the space. No heuristic, no LLM, no gradient. When you can enumerate, enumerate. Save the LLM-guided phase for spaces too large to grid — like component geometry or motor selection from a database of 500+ motors.

What Comes Next

The evaluator, harness, ledger, and constraint system are all in place. Every future expansion reuses the same pipeline. The mutation surface is ready to widen:

  1. Motor database sweep — OpenRocket ships hundreds of motor definitions (Estes, AeroTech, Cesaroni). Grid-sweep the full catalog with constraint checking. This is the obvious next EGRI loop given what the data showed.
  2. Component geometry EGRI — fin span, nose cone shape/length, body tube dimensions. This is where LLM-guided search earns its keep — the design space is too large for exhaustive grid.
  3. Multi-objective Pareto — altitude vs. ground hit velocity vs. cost. Map the Pareto frontier and let the evaluator surface the tradeoffs.
  4. Cross-design EGRI — optimize across rocket archetypes, not just parameters within one design. The 15-rocket benchmark is the seed data for this.

Install the Skill

npx skills add broomva/openrocket-sim

The skill gives any Claude Code session access to the full headless simulation API, CLI tool docs, EGRI integration patterns, and 8 compounding strategies for building on top of OpenRocket.

The Thesis

Physics simulations are the strongest class of EGRI evaluator: deterministic, fast, trusted, and structured. But the quality of the optimization is bounded by the mutation surface. A perfect evaluator over the wrong variables does not fail — it succeeds at telling you that you are looking in the wrong place.

145 trials. 297 seconds. Zero promotions. One motor swap: +531%.

The evaluator does not lie. The question is whether you are listening.

Reactions

broomva.tech

Reliability engineering for complex systems.

  • Pages
  • Home
  • Projects
  • Writing
  • Notes
  • Tools
  • Chat
  • Prompts
  • Link Hub
  • Social
  • GitHub
  • LinkedIn
  • X