
A lot has moved since the PID / state-space / robust-modern / ADRC / TDC era.
The big change is not that classical control got replaced. It did not. The real shift is that control is increasingly being fused with data, optimization, safety certificates, and learned models. The strongest frontier today is not "one new universal controller," but a stack of methods that combine model-based guarantees with data-driven adaptation.
This post is a map. Part one covers what genuinely advanced. Part two connects it to the architecture of LLM-based agent systems — where the control metalayer meets the modern control stack.
Part 1: What Actually Changed
1. Data-driven predictive control became serious
One of the most important developments is data-driven MPC, especially DeePC and related methods. Instead of first identifying a parametric model and then designing the controller, these methods predict future trajectories directly from data, often using behavioral systems ideas (Willems' fundamental lemma). What changed recently is that the field moved from "interesting idea" to "we can now state stability, robustness, and constraint-satisfaction guarantees under clearer assumptions."
Why it matters relative to classical methods:
- Compared with classical MPC, it reduces model-identification burden
- Compared with pure adaptive control, it is much more naturally constraint-aware
- Compared with heuristic data-driven tuning, it is becoming more theoretically grounded
What is still hard: noise, closed-loop identification bias, computational scaling, and nonlinear systems without losing guarantees. Recent work is explicitly focused on those issues — regularized DeePC, distributionally robust DeePC, and recursive feasibility results are narrowing the gap.
2. Safety became its own control layer
The rise of Control Barrier Functions (CBFs) as a standard way to enforce safety constraints online is one of the clearest "post-classical" developments. In practice, people increasingly think in terms of a performance controller + safety filter architecture: MPC, LQR, RL, or another nominal controller proposes an action; a CBF-based quadratic program minimally edits it to keep the state inside a safe set.
What is new is not the basic idea alone, but the extensions: high-order CBFs, adaptive CBFs, disturbance-aware CBFs, input-constrained CBFs, sampled-data and adversarially perturbed settings.
This changes control architecture. Instead of baking everything into one monolithic law, you can separate:
- Stabilization/performance — the nominal controller
- Safety certification — the CBF shield
- Learning/adaptation — the data-driven layer
That decomposition is architecturally significant. It means you can swap, tune, or learn each layer independently while maintaining hard safety guarantees at the shield level.

3. Koopman-based control matured
Koopman methods are a real development beyond standard nonlinear control. The idea is to "lift" nonlinear dynamics into a higher-dimensional representation where the dynamics are approximately linear, then apply linear control machinery there.
Why people care:
- It offers a middle ground between first-principles nonlinear modeling and black-box deep nets
- It is more compatible with linear control toolchains than most learned models
- It can help when local linearization is too weak but full nonlinear MPC is too expensive
The catch: choosing observables/lifted coordinates remains nontrivial, guarantees are often approximate, and performance depends heavily on representation quality. Promising, but not magic. The recent progress is that Koopman MPC has become a substantial research program with explicit error bounds and stability analysis for forced systems.
4. MPC and reinforcement learning are converging
A real frontier now is not "RL instead of control," but MPC + RL hybrids. The broad idea is that MPC handles short-horizon optimization, constraints, and structure, while RL learns value functions, terminal ingredients, residual policies, or long-horizon improvements.
This matters because plain RL by itself is still too fragile for many real control systems. The newer work is increasingly control-oriented RL — borrowing stability, robustness, constraint handling, and safety ideas from control rather than pretending exploration alone will solve everything.
A good way to think about the evolution:
- Old view: derive controller from model
- Mid-era view: learn policy from interaction
- Current view: embed optimization, constraints, and certificates inside the learned controller
5. Differentiable control is becoming practical
Differentiable MPC / differentiable predictive control makes the controller or optimizer differentiable end-to-end, so gradients can pass through the control layer into model parameters, policy parameters, or perception modules.
Why this matters:
- You can train controllers jointly with learned dynamics or perception stacks
- You can use control layers inside larger learning systems
- You can get structured policies that retain optimization-based behavior
This is especially relevant in robotics and autonomous systems, where the old boundary between estimation, planning, and control is becoming softer. It's part of a broader "control meets modern ML tooling" trend.
6. Distributionally robust control got more central
Robust control used to mean bounded uncertainty or worst-case disturbances. A newer emphasis is distributional uncertainty: not only are disturbances random, but the probability law itself is uncertain. That is where distributionally robust control / distributionally robust MPC comes in.
Conceptually, it sits between classical robust control and stochastic MPC: less conservative than pure worst-case in some formulations, more realistic than assuming a perfectly known noise distribution. Especially relevant in energy systems, autonomous navigation, safety-critical robotics, and data-driven settings where the learned disturbance model is itself uncertain.
7. Learned dynamics models are mainstream
People are increasingly comfortable using neural state-space models, learned latent dynamics, and hybrid gray-box models as part of the control loop. The emphasis is usually not "throw away physics," but "use learned dynamics where first-principles models are weak or too expensive."
This changes controller design because the problem becomes:
- Learn a dynamics representation
- Quantify uncertainty or error
- Wrap it in MPC / safety filter / robust layer
That workflow is much more common now than a decade ago, and it is one of the clearest bridges from classical control to ML-native systems.
8. Safe learning is now first-class
There is now a much stronger emphasis on learning with guarantees, especially in robotics and multi-agent systems. The question is no longer only "can it learn a good controller?" but "can it learn while respecting state/input constraints, collision avoidance, and liveness requirements?"
This is a real departure from older adaptive-control culture. Historically, adaptation focused heavily on convergence/stability. The newer literature adds: exploration safety, certified safe sets, runtime shielding, and integration with formal methods and barrier certificates.
9. Digital twins are becoming control infrastructure
In industry, digital twins are increasingly tied to prediction, monitoring, and adaptive control instead of just offline simulation. The field is moving toward twins that continuously assimilate data and support optimization/control decisions.
That means the real architecture is often:
physical plant ↔ state estimation ↔ twin/learned model ↔ optimizer/controller ↔ safety layer
For real deployments, this may matter more than a single new control law, because it changes how control is embedded in a larger cyber-physical stack.
10. What is actually new versus repackaging
Here is the blunt assessment.
Actually important and durable:
- Data-driven MPC / DeePC
- Safety filters via CBFs
- Koopman-based control as a nonlinear-middle-ground tool
- Learning-based MPC and MPC-RL hybrids
- Distributionally robust formulations
- Learned dynamics wrapped with guarantees
Important but still unsettled:
- Differentiable control as a general recipe
- Foundation-model style control stacks
- End-to-end learned controllers without strong structure
Not a replacement for classical control:
None of this kills PID, observers, LQR/LQG, H-infinity, MPC, ADRC, or delay compensation. In practice, the new methods are usually layered on top of those foundations, not replacing them.
The modern stack
If I had to compress "what's new since ADRC/TDC" into one sentence: the frontier moved from hand-derived controllers for uncertain plants toward architectures that combine optimization, data, learned models, and explicit safety certificates.
The stack looks like this:
| Layer | Components |
|---|---|
| Base dynamics/control | State-space, observers, robust control, MPC |
| Data layer | Learned dynamics, DeePC, Koopman lifting |
| Safety layer | CBFs, safe sets, shields |
| Uncertainty layer | Stochastic or distributionally robust optimization |
| Learning layer | RL, residual learning, differentiable control |
| Systems layer | Digital twin, estimation, deployment, online adaptation |
Part 2: The Agentic Control Metalayer
Everything in Part 1 describes what's happening in control theory. But there's a parallel development in AI systems: agents — LLM-based systems that use tools, make decisions, and operate in loops. The question is how these two worlds connect.
The answer is architectural. An LLM should not be the servo controller. It should be the slow, supervisory, tool-using controller that emits typed, auditable control decisions — while fast inner loops (PID/MPC/CBF-QP) execute deterministically.
Agent as control law
Let the plant be a partially observed stochastic dynamical system with state x, control input u, disturbance w, and observation y. An agentic controller is a tool-using policy operating on a typed belief state derived from observations and logs.
The LLM-generated decision is not the raw u (except in very slow plants). Instead, the LLM emits a structured control directive θ that parameterizes deterministic control modules:
- MPC weights, horizon, constraints, reference trajectories
- CBF barrier parameters, class-K function tuning
- Model update requests (Koopman lift changes, retraining triggers)
- Selection among controllers (switching logic)
A deterministic controller module produces a candidate control sequence from θ. A safety shield then projects candidate inputs into the safe set via CBF-QP:
u_safe = argmin ||u - u_proposed||² subject to SafetyConstraints(belief, u)
The runtime logs every decision into an append-only trace ledger, and repeats.
This architecture intentionally limits the LLM's degrees of freedom to what can be reliably evaluated and constrained. The agent gets only as much mutation freedom as the evaluator can judge.
Multi-rate hierarchy
The critical design choice is where in the loop hierarchy the LLM sits:
| Loop type | Cadence | LLM here? | Rationale |
|---|---|---|---|
| Servo stabilization | milliseconds | No | Requires deterministic deadlines |
| Constrained control (MPC/CBF-QP) | 10–100 ms | No | Solve QPs/NLPs deterministically; LLM tunes weights |
| Supervisory planning / mode switching | seconds | Yes | Aligns with tool-driven agents, typed actions |
| Auto-tuning / controller synthesis (EGRI) | minutes–days | Yes | Requires evaluator-first + rollback + ledger |
In most physical/fast systems, the LLM should output controller parameters and plans that deterministic modules execute. In slower cyber "plants" (cloud ops, workflow routing), an LLM can act closer to the control law — but still requires harnesses, verifiers, and rollback.

LLM roles in the control stack
The LLM can play several roles. The right choice depends on what it outputs and how that output is verified:
-
Supervisory controller: setpoints, mode switches, constraints, policy updates. Strong for long-horizon reasoning and goals-to-constraints translation. Needs typed schemas and audit gates.
-
Meta-controller over tools: chooses which control module to invoke (PID/MPC/DeePC/Koopman/RL), triggers identification. Modular, supports policy switching — but requires strict allowed-tools lists.
-
Online identifier: decides what data to collect, when to update models, what experiments to run. Good at experiment design and anomaly interpretation — but unsafe probing must be gated by budget and safety constraints.
-
Controller synthesizer: writes/edits controller code, safety specs, unit tests, config. Converts reasoning into deterministic artifacts — but code-gen errors require harness gating and CI audits.
-
EGRI loop compiler: designs problem-specs, mutation operators, evaluator logic, promotion rules. Makes "improve this controller" a safe closed-loop process — but requires strong evaluators and anti-gaming checks.
Blueprint architecture
The architecture treats the governance/harness/orchestration stack as the operating system and adds a control-and-world-model kernel:
Governance layer — setpoints, policies, audit gates, trace ledger. Provides the behavioral envelope. Analogous to .control/policy.yaml with setpoint IDs, measurement targets, and severity levels.
Harness layer — deterministic test/lint/typecheck scripts, observability contracts, entropy checks. Provides the reliable measurement function and experiment protocol.
Orchestration layer — daemon scheduler (poll/dispatch/reconcile), isolated workspace manager with safety invariants, status surface. Handles multi-agent coordination.
Control kernel — the new piece:
- Plant interface (typed state/action schemas)
- Observer / state estimator
- World models: Koopman, learned dynamics, digital twin
- MPC / DeePC planners
- Safety shield: CBF-QP constraint filters
- Robust / DRO scenario engine
Auto-improvement layer — Evaluator-Governed Recursive Improvement (EGRI). Problem-spec compiler, evaluator + constraints, promotion/rollback policy. Controller tuning as an explicit bounded closed-loop optimization.

Control-flow: a single tick
For each supervisory cycle:
- Runtime observes the plant → gets y_t
- Runtime updates the estimator → gets belief state b_t
- Runtime sends typed state summary to the LLM agent
- LLM returns a control directive θ_t + tool choice
- Controller module produces proposed u_t from (b_t, θ_t)
- Safety shield filters proposed u_t → safe u_t + certificate
- Runtime applies safe u_t to the plant
- Trace sink logs the full trace entry (proposed vs applied, constraints checked, metrics)
The LLM never directly calls the plant. It calls only Controller or MetaController tools with strict schemas. The runtime handles all plant interactions and safety enforcement.
Mapping modern control to agent components
Each control technique from Part 1 has a natural home in this architecture:
| Technique | Agent component | LLM's best role |
|---|---|---|
| DeePC | Dataset store + optimizer tool | Experiment design, horizon/regularization tuning |
| CBF shields | Hard safety filter (QP) | Choose constraints/margins; never bypass |
| Koopman + MPC | World model + planner | Dataset curation, retraining triggers |
| MPC-RL hybrids | Proposal generator + eval harness | Tune MPC weights; policy search under evaluator |
| Differentiable MPC | Learning pipeline primitive | Generate model structures, training harness scripts |
| DRO | Robust MPC module + scenario engine | Curate scenario sets, choose robustness tradeoffs |
| Digital twins | Harness for safe experimentation | Orchestrate sim experiments; interpret mismatches |
Failure modes
Common failures when LLMs participate in control — and their mitigations:
Spec/constraint hallucination: LLM invents constraints, misreads units, or forgets invariants. Mitigation: JSON-schema structured outputs + strict tool schemas + policy gates.
Unsafe exploration: LLM runs aggressive identification experiments. Mitigation: EGRI budgets + hard constraints + CBF shield; enforce evaluator-first and sandbox modes.
Latency spikes: tool runtimes reject or queue requests under load. Mitigation: multi-rate design; fallback controllers; don't place LLM in fast loops.
Evaluator gaming: the agent learns to exploit metric loopholes in the outer improvement loop. Mitigation: holdout scenario sets, adversarial tests, immutable evaluator artifacts.
The practical mental model
The modern control stack for agent systems:
┌─────────────────────────────────────────┐
│ EGRI / Auto-Improvement (minutes–days) │ ← LLM designs improvement loops
├─────────────────────────────────────────┤
│ Supervisory Planning (seconds) │ ← LLM emits typed directives
├─────────────────────────────────────────┤
│ Safety Shield / CBF-QP (per-tick) │ ← deterministic, hard constraints
├─────────────────────────────────────────┤
│ MPC / DeePC / Koopman (10–100ms) │ ← deterministic optimization
├─────────────────────────────────────────┤
│ Servo / PID / State Feedback (ms) │ ← deterministic inner loops
├─────────────────────────────────────────┤
│ Plant / Digital Twin │ ← physical or cyber system
└─────────────────────────────────────────┘
Classical control is the foundation. The new methods are layered on top. The LLM sits at the top — slow, supervisory, constrained by typed schemas and safety shields — making the decisions that require reasoning, while everything below executes deterministically.
The frontier didn't replace what came before. It built a stack on top of it.
This post is part of a series on control-systems thinking applied to agent architecture. Previous entries: Control Systems as Self-Engineering (2019), The Control Metalayer (2026). The agentic-control-kernel skill implements the blueprint described in Part 2.