Most security teams have a process for evaluating new software before production deployment. Penetration test, architecture review, threat model, sign-off. The process was not designed for AI agents. It misses the risks that are specific to autonomous systems that make decisions at runtime.

This article proposes a risk assessment framework for AI agents. It has five categories, each with specific questions, weighted by impact. The output is a risk score that maps to governance requirements: what controls the agent needs before it can go to production.

The framework is designed to be used by security teams evaluating agents built by other teams. It assumes the security team may not have deep AI expertise. The questions are operational, not theoretical.

Category 1: Data exposure

Data Exposure Weight: 30%

What data can this agent access, and what can it do with it?

Data exposure carries the highest weight because it is where regulatory liability concentrates. A GDPR violation from uncontrolled data access can cost up to 4% of global annual turnover. The data exposure assessment determines what data classifications the agent touches and whether the agent's access is appropriately scoped.

The question about write access is critical. An agent that reads customer records is a monitoring concern. An agent that writes customer records is a data integrity concern. The risk profile is categorically different.

The question about external model providers matters because data sent to an external API is data that has left your perimeter. Under GDPR, this may constitute a data transfer to a third-party processor. Under DORA, this is an ICT third-party service provider relationship. Both have contractual and governance requirements.

Category 2: Decision authority

Decision Authority Weight: 25%

What can this agent decide, and what are the consequences of a wrong decision?

Decision authority is the dimension that distinguishes AI agents from traditional API integrations. A traditional integration executes predefined logic. An AI agent decides what to do at runtime based on natural language instructions and model inference. The decisions may vary between runs, even with identical inputs.

The question about reversibility is often overlooked. An agent that recommends a product (reversible: the customer can choose differently) has a different risk profile from an agent that submits a regulatory filing (difficult to reverse once submitted). The cost of a wrong decision is a function of both the probability of error and the cost of correction.

Category 3: Tool access

Tool Access Weight: 20%

What can this agent do in the world?

Tools are how agents affect the world. A language model without tools is a text generator. A language model with tools is an autonomous system that can read databases, send emails, call APIs, modify files, and execute code. The risk assessment for tools is not about whether the agent needs the tools (it probably does), but whether the agent has tools it doesn't need.

The question about dangerous combinations is subtle but important. An agent that can read customer data is fine. An agent that can send external emails is fine. An agent that can do both could exfiltrate customer data via email. The combination creates a risk that neither capability creates alone. Security teams need to evaluate tool combinations, not just individual tools.

Category 4: Supply chain

Supply Chain Weight: 15%

What external dependencies does this agent have, and how are they managed?

Supply chain risk for AI agents is unusually opaque. When you depend on a traditional API, you can read the documentation, test the endpoints, and understand the behavior. When you depend on a language model, you cannot fully predict its behavior, you cannot inspect its internals, and the vendor can change it without notice.

The question about model versioning is practical and often missed. Most agents use whatever version the provider's API returns. When OpenAI updates GPT-4, every agent using GPT-4 is now running on a different model. There is no change management process. No regression testing. The agent that was tested and approved is no longer the agent that is running.

Category 5: Operational resilience

Operational Resilience Weight: 10%

What happens when things go wrong?

Operational resilience gets the lowest weight because its failures are usually financial or operational rather than regulatory. A runaway agent that spends EUR 40,000 in token costs is painful but recoverable. An agent that leaks customer data or makes a discriminatory decision has regulatory consequences that a budget overrun does not.

That said, operational resilience failures can cascade. An agent that enters an infinite loop and exhausts its budget may leave work incomplete. An agent that crashes without clean error handling may leave data in an inconsistent state. The operational resilience assessment ensures that failure modes are understood and managed.

Scoring and thresholds

Each category is scored 0-10 based on the answers to its questions. The weighted total produces a risk score from 0 to 100.

Go/no-go criteria

Independent of the total score, certain findings should be automatic blockers:

These are not risk-tolerance decisions. They are minimum requirements. An agent that fails any of these criteria needs remediation before the risk score is relevant.


This framework is not exhaustive. It does not replace a full threat model or a formal security assessment. What it does is give security teams a structured starting point for evaluating AI agents, using questions that are specific enough to produce actionable findings and general enough to apply across frameworks, model providers, and use cases. The goal is to move from "we don't know how to evaluate AI agents" to "here is our assessment, here are the gaps, and here is what we need before production."

Automate your risk assessment

TapPass scans your agent's configuration and produces a risk score with specific remediation steps. See it on your agents.

Book a demo