Security leaders need a way to measure where they stand on AI agent governance. Not a maturity model with five levels of aspirational descriptions. A practical scoring framework that produces a number, identifies gaps, and points to specific actions.
This is a framework I've developed from conversations with security teams at regulated European enterprises. It has eight dimensions, each scored 0 to 4. The total score ranges from 0 to 32. It is not comprehensive enough to be a formal standard and not shallow enough to be marketing. It sits in between: a useful starting point for a conversation between the CISO and the board, or between the security team and the engineering teams deploying AI agents.
The eight dimensions
Inventory completeness
Do you know how many AI agents are running in your organization?
- 0: No inventory exists. Nobody knows how many agents are deployed or where.
- 1: Partial inventory based on team surveys. Known to be incomplete.
- 2: Maintained inventory of officially sanctioned agents. Shadow AI not covered.
- 3: Comprehensive inventory with automated detection of new agent deployments.
- 4: Real-time inventory with automatic registration, classification, and owner assignment for every agent that connects.
Identity and access control
Can you distinguish one agent from another? Can you scope their permissions individually?
- 0: Agents use shared API keys. No individual identity.
- 1: Some agents have dedicated API keys, but permissions are not scoped per agent.
- 2: Each agent has a unique identity. Permissions are defined but not dynamically enforced.
- 3: Per-agent identity with enforced permissions: model access, tool restrictions, data scope, budget limits.
- 4: Cryptographic identity (SPIFFE/JWT) with short-lived credentials, automatic rotation, and instant revocation capability.
Runtime monitoring
Can you see what your agents are doing right now?
- 0: No monitoring beyond model provider dashboards (total tokens, total cost).
- 1: Application-level logging of prompts and responses. Not structured for analysis.
- 2: Structured logging of agent actions including tool calls and data access. Searchable.
- 3: Real-time monitoring with anomaly detection: budget overruns, unusual tool usage, scope violations.
- 4: Continuous monitoring with automated response: alerts, throttling, session termination, escalation to human review.
Policy enforcement
Are agent permissions enforced at runtime, or just documented?
- 0: No policy exists. Agents can do whatever the code allows.
- 1: Written policies exist but are enforced by developer discipline, not by infrastructure.
- 2: Some controls enforced (e.g., model allowlists), but tool access and data scope are not.
- 3: Comprehensive policy enforcement at a gateway layer: model, tool, data, budget, and temporal restrictions.
- 4: Dynamic policy with context-aware enforcement: policies adapt based on data sensitivity, time of day, session behavior, and risk signals.
Audit trail quality
Could you reconstruct exactly what an agent did last Tuesday?
- 0: No audit trail beyond model provider logs.
- 1: Application logs exist but are incomplete: missing tool calls, data access, or decision context.
- 2: Complete audit trail of all agent actions. Mutable storage (could be modified after the fact).
- 3: Complete, immutable audit trail with tamper detection. Queryable by agent, time, data classification, and action type.
- 4: Auditor-ready trail with hash chaining, independent verification, and pre-built reports that map to regulatory requirements (GDPR Art. 30, EU AI Act Art. 12, DORA).
Human oversight
Can a human intervene in agent operations when needed?
- 0: No intervention capability. Agents operate autonomously with no human touchpoint.
- 1: Manual intervention possible by stopping the agent process. No graceful mechanism.
- 2: Kill switch exists for each agent. No granular control (all-or-nothing).
- 3: Approval queues for high-risk actions. Agents can pause and wait for human decisions.
- 4: Tiered oversight: automatic approval for low-risk actions, human review for medium-risk, mandatory approval for high-risk. Dashboard for real-time visibility and intervention.
Incident response
What happens when an agent does something wrong?
- 0: No incident response process for AI agent events.
- 1: AI incidents handled ad hoc by the development team. No formal process.
- 2: AI incidents included in existing incident response process. Response steps not AI-specific.
- 3: Dedicated AI incident playbooks covering containment, investigation, remediation, and post-incident review.
- 4: Automated incident detection and initial response. Integration with SIEM/SOAR. Regulatory reporting workflows tested and ready.
Third-party risk management
How well do you manage the risks from model providers and AI tool vendors?
- 0: No assessment of model provider risks. Standard terms accepted without review.
- 1: Model providers included in vendor risk assessment. Assessment not AI-specific.
- 2: AI-specific risk assessment covering data handling, model updates, availability, and security.
- 3: Contractual protections in place: data processing agreements, audit rights, incident notification, model change notification.
- 4: Multi-provider strategy with failover. Model provider concentration risk assessed and mitigated. Exit plans documented and tested.
Interpreting the score
0-8: Unprotected. AI agents are operating without governance. The organization is exposed to financial, regulatory, and reputational risk from AI agent behavior. Immediate action required: start with inventory and identity.
9-16: Emerging. Some governance exists but significant gaps remain. Common pattern: monitoring exists but enforcement doesn't. Or policy exists but isn't enforced at runtime. Priority: close the enforcement gap and build the audit trail.
17-24: Established. Core governance controls are in place. The organization can monitor, control, and audit its AI agents. Remaining gaps are typically in incident response and third-party risk management. Priority: test the controls under realistic conditions.
25-32: Advanced. Comprehensive AI agent governance with dynamic policy, automated response, and auditor-ready evidence. The organization is positioned for regulatory compliance and can demonstrate governance maturity to customers and partners.
Where most organizations score
Based on conversations with security teams at 20+ European enterprises, the typical score is between 4 and 10. The most common pattern:
- Inventory: 1 (partial, survey-based)
- Identity: 0-1 (shared keys, no per-agent identity)
- Monitoring: 1 (application logs, not structured)
- Enforcement: 0-1 (documented but not enforced)
- Audit trail: 1 (incomplete, mutable)
- Oversight: 1 (kill switch only)
- Incident response: 1 (ad hoc, no playbooks)
- Third-party: 1 (included in vendor assessment, not AI-specific)
That totals 6 to 8 out of 32. Unprotected. This is not because these teams are negligent. It is because AI agent governance is a new discipline and the tooling is still emerging. Most organizations are doing what they can with what they have. The framework helps identify where the gaps are largest and where investment will have the most impact.
Using the score
The score is useful in three contexts.
Board reporting. A single number that communicates the organization's AI governance posture. "We scored 7 out of 32 on the AI readiness assessment. Our target is 20 by Q4 2027. Here's the plan." This is the kind of communication boards can act on.
Prioritization. The dimension with the lowest score is usually the highest priority. An organization that scores 3 on monitoring but 0 on enforcement should focus on enforcement. Monitoring without enforcement means you can see the problem but can't stop it.
Vendor evaluation. When evaluating AI governance tooling, map the vendor's capabilities to the eight dimensions. Which dimensions does the tool address? Which does it leave unaddressed? This prevents buying a monitoring tool when the gap is in enforcement, or an audit tool when the gap is in identity.
This framework is not a standard. It is a starting point. The dimensions may evolve as the regulatory landscape matures and as AI agent architectures change. But the underlying question is permanent: do you know what your AI agents are doing, can you control it, and can you prove it? The score gives you a number. The dimensions give you a map.
Get your readiness score
We'll walk through the 8 dimensions with your team and identify the highest-impact gaps. 20 minutes. Honest assessment.
Book a demo