Your AI agents need a Certificate of Airworthiness

I spent 25 years keeping aircraft fit to fly. Now I keep thinking about AI agents the same way — and I can't unsee it.

In aviation we don't certify an aircraft once and trust it forever. We can't. The airframe ages, parts wear, the operating environment shifts, and new hazards surface that nobody anticipated at design time. So we built a discipline called continuing airworthiness: an aircraft stays registered, monitored, maintained and provably fit for service across its entire life — and the moment risk becomes unacceptable, we ground it. No debate. It stops flying until someone qualified signs it back into service.

Most enterprises are governing AI agents the way aviation governed aircraft a hundred years ago: approve it once, deploy it, hope for the best.

That model is already breaking. An AI agent isn't a static model you validate and forget. It acts autonomously, its actions carry real operational consequence, and it persists in production where its tools, data and threat surface keep changing underneath it. Certification at birth is necessary. It is nowhere near sufficient.

So here's the idea I've been developing: Continuous Agentworthiness — the ongoing process of ensuring an AI agent remains authorized, secure, traceable, compliant, bounded and fit for its intended purpose throughout its operational lifecycle.

Eight pillars, each borrowed from something aviation already does well:

Agent Registration — who owns it, what's its business purpose, what's its risk rating (like aircraft registration).
Identity & Credentials — issued identity, secrets and certificates (the Certificate of Airworthiness).
Authorization & Access Control — least-privilege access to data, tools and systems (operating authorizations).
Configuration Control — every prompt, model, tool and RAG-source change is tracked (Part-21 design changes).
Monitoring & Observability — is behavior drifting, is output degrading, are permissions being abused (continuing airworthiness monitoring).
Agent Directives — when a vendor publishes a security bulletin, the organization issues a mandatory directive with a deadline and proof of closure, exactly like an Airworthiness Directive. Every affected agent must comply.
Incident Reporting — hallucinations, unauthorized actions, prompt injection, data leakage — captured and severity-classified (occurrence reporting).
Grounding & Return to Service — on breach or dangerous behavior the agent is disabled immediately; it returns to active only after root cause, security review, testing and approval (a Certificate of Release to Service).

Two of these are where it earns its keep. Agent Directives give you fleet-wide mandatory action with a deadline and evidence — not the ad-hoc patching most teams actually do. And Grounding / Return to Service give you the thing most AI governance frameworks quietly lack: a pre-authorized power to pull an agent instantly, and a gated path back. Detection without that authority is just a nicer post-mortem.

To be clear about what this is and isn't: I'm not proposing a replacement for NIST AI RMF, ISO/IEC 42001 or the EU AI Act. Those are the control catalogues. Agentworthiness is the operating discipline that wraps them — the lifecycle, the mandatory-action mechanism, the grounding authority.

And one honest limitation, because pretending otherwise would be dishonest engineering: aircraft are deterministic; LLM agents are stochastic. "Is behavior normal?" is trivial for an engine and genuinely hard for an agent. We can't make agents perfectly predictable. We can make them bounded and observable — through evaluation suites, regression testing and output scoring rather than vibration limits. That's the honest claim.

Why does the aviation framing matter? Because most of us explain AI governance with NIST, OWASP and IAM diagrams, and most executives nod politely without internalizing it. Say "we need continuous airworthiness for our AI agents" and they get it instantly — registration, monitoring, compliance, maintenance, incident reporting, grounding, return to service — all in one phrase, from an industry whose entire credibility rests on safety.

I don't think the value is the name. The value is that aviation already solved this class of problem, with decades of regulatory rigor behind it, and almost nobody is translating that into AI.

If you're building or governing AI agents — does the airworthiness lens help, or am I stretching the analogy too far? Genuinely want to hear where it breaks.