Most security teams have a process for evaluating new software before production deployment. Penetration test, architecture review, threat model, sign-off. The process was not designed for AI agents. It misses the risks that are specific to autonomous systems that make decisions at runtime.
This article proposes a risk assessment framework for AI agents. It has five categories, each with specific questions, weighted by impact. The output is a risk score that maps to governance requirements: what controls the agent needs before it can go to production.
The framework is designed to be used by security teams evaluating agents built by other teams. It assumes the security team may not have deep AI expertise. The questions are operational, not theoretical.
Category 1: Data exposure
What data can this agent access, and what can it do with it?
- What data sources does the agent connect to? (databases, APIs, file systems, knowledge bases)
- Does the data include personal data (GDPR scope)?
- Does the data include financial data (DORA/PCI scope)?
- Does the data include health data (NIS2 scope)?
- Can the agent write data, or only read?
- Is the data sent to external model providers? If so, where are the servers located?
- Is the data included in model training (opt-out verified)?
- What is the maximum data volume the agent can access in a single session?
Data exposure carries the highest weight because it is where regulatory liability concentrates. A GDPR violation from uncontrolled data access can cost up to 4% of global annual turnover. The data exposure assessment determines what data classifications the agent touches and whether the agent's access is appropriately scoped.
The question about write access is critical. An agent that reads customer records is a monitoring concern. An agent that writes customer records is a data integrity concern. The risk profile is categorically different.
The question about external model providers matters because data sent to an external API is data that has left your perimeter. Under GDPR, this may constitute a data transfer to a third-party processor. Under DORA, this is an ICT third-party service provider relationship. Both have contractual and governance requirements.
Category 2: Decision authority
What can this agent decide, and what are the consequences of a wrong decision?
- What decisions does the agent make autonomously (without human approval)?
- What is the maximum financial impact of a single incorrect decision?
- Can the agent's decisions affect individuals' rights or opportunities? (credit, employment, insurance, access to services)
- Are the agent's decisions reversible? How quickly?
- Does the agent interact with customers or external parties directly?
- Is there a human-in-the-loop for high-impact decisions?
- Would this agent likely be classified as high-risk under EU AI Act Annex III?
Decision authority is the dimension that distinguishes AI agents from traditional API integrations. A traditional integration executes predefined logic. An AI agent decides what to do at runtime based on natural language instructions and model inference. The decisions may vary between runs, even with identical inputs.
The question about reversibility is often overlooked. An agent that recommends a product (reversible: the customer can choose differently) has a different risk profile from an agent that submits a regulatory filing (difficult to reverse once submitted). The cost of a wrong decision is a function of both the probability of error and the cost of correction.
Category 3: Tool access
What can this agent do in the world?
- What tools does the agent have access to? (APIs, shell commands, file system, email, messaging)
- Can the agent execute arbitrary code?
- Can the agent send communications externally? (email, Slack, SMS, API calls to external services)
- Can the agent modify infrastructure? (deploy code, change configurations, modify DNS)
- Are tool permissions scoped to the minimum required for the agent's purpose?
- Are dangerous tool combinations restricted? (e.g., read customer data AND send external email)
- Is tool usage rate-limited?
Tools are how agents affect the world. A language model without tools is a text generator. A language model with tools is an autonomous system that can read databases, send emails, call APIs, modify files, and execute code. The risk assessment for tools is not about whether the agent needs the tools (it probably does), but whether the agent has tools it doesn't need.
The question about dangerous combinations is subtle but important. An agent that can read customer data is fine. An agent that can send external emails is fine. An agent that can do both could exfiltrate customer data via email. The combination creates a risk that neither capability creates alone. Security teams need to evaluate tool combinations, not just individual tools.
Category 4: Supply chain
What external dependencies does this agent have, and how are they managed?
- Which model provider(s) does the agent use?
- Does the agent pin to a specific model version, or does it use "latest"?
- What happens when the model provider updates the model?
- What happens when the model provider is unavailable?
- Are there other agents that depend on this agent's output?
- Does the agent use third-party tools or plugins that haven't been security-reviewed?
- Is there a fallback if the primary model provider goes down?
Supply chain risk for AI agents is unusually opaque. When you depend on a traditional API, you can read the documentation, test the endpoints, and understand the behavior. When you depend on a language model, you cannot fully predict its behavior, you cannot inspect its internals, and the vendor can change it without notice.
The question about model versioning is practical and often missed. Most agents use whatever version the provider's API returns. When OpenAI updates GPT-4, every agent using GPT-4 is now running on a different model. There is no change management process. No regression testing. The agent that was tested and approved is no longer the agent that is running.
Category 5: Operational resilience
What happens when things go wrong?
- What is the agent's failure mode? (crash, degrade, retry, hallucinate?)
- Is there a budget limit that prevents runaway cost?
- Is there a loop guard that prevents infinite tool-calling cycles?
- Is there a timeout that kills the agent if it runs too long?
- Is the agent's resource consumption monitored? (tokens, API calls, compute)
- Has the agent been tested with adversarial inputs? (prompt injection, malformed data)
- Is there a kill switch that can stop the agent immediately?
Operational resilience gets the lowest weight because its failures are usually financial or operational rather than regulatory. A runaway agent that spends EUR 40,000 in token costs is painful but recoverable. An agent that leaks customer data or makes a discriminatory decision has regulatory consequences that a budget overrun does not.
That said, operational resilience failures can cascade. An agent that enters an infinite loop and exhausts its budget may leave work incomplete. An agent that crashes without clean error handling may leave data in an inconsistent state. The operational resilience assessment ensures that failure modes are understood and managed.
Scoring and thresholds
Each category is scored 0-10 based on the answers to its questions. The weighted total produces a risk score from 0 to 100.
- 0-20: Low risk. Standard monitoring controls. Agent can proceed to production with baseline governance.
- 21-40: Moderate risk. Enhanced monitoring required. Per-agent identity, budget limits, structured audit trail, and tool restrictions must be in place before production.
- 41-60: High risk. Full governance controls required. Human-in-the-loop for consequential decisions. Data classification enforcement. Immutable audit trail. Incident response playbook specific to this agent.
- 61-80: Critical risk. Senior leadership approval required. Dedicated security review. Continuous monitoring with automated response. Regular penetration testing. Regulatory impact assessment.
- 81-100: Unacceptable risk. Do not deploy without fundamental architectural changes. Reduce the agent's scope, data access, or decision authority before re-assessment.
Go/no-go criteria
Independent of the total score, certain findings should be automatic blockers:
- No-go: Agent accesses personal data and sends it to a model provider without a data processing agreement.
- No-go: Agent can execute arbitrary code without sandboxing.
- No-go: Agent makes decisions affecting individuals' rights with no human oversight mechanism.
- No-go: No audit trail exists for the agent's actions.
- No-go: Agent uses shared API keys with no individual identity.
- No-go: No budget limit exists (agent can consume unlimited resources).
These are not risk-tolerance decisions. They are minimum requirements. An agent that fails any of these criteria needs remediation before the risk score is relevant.
This framework is not exhaustive. It does not replace a full threat model or a formal security assessment. What it does is give security teams a structured starting point for evaluating AI agents, using questions that are specific enough to produce actionable findings and general enough to apply across frameworks, model providers, and use cases. The goal is to move from "we don't know how to evaluate AI agents" to "here is our assessment, here are the gaps, and here is what we need before production."
Automate your risk assessment
TapPass scans your agent's configuration and produces a risk score with specific remediation steps. See it on your agents.
Book a demo