AI Application Security: Best Practices for LLM, RAG and Agentic Systems

Why AI application security is different

AI application security is not about making the prompt stronger. It is about designing the system so that the model sees only what it should see, calls only what it should call, changes only what it is allowed to change, and leaves enough evidence for humans to review, monitor and recover from failures.

Traditional applications are usually built around deterministic endpoints, typed data structures, validation rules, role checks and database transactions. The application receives input, checks it, executes known code paths and returns a result. Bugs still happen, but the security model is usually based on controlling code, identity, data flow and state changes.

An AI application adds a different layer. It may process natural language, retrieved documents, tool outputs, e-mails, images, PDFs, screenshots, system prompts and user instructions in one context. The model may confuse data and instructions. Retrieved documents may contain malicious or misleading instructions. The output of the model may become input for another system. An AI agent may act with the permissions of a user, service account or tool. A failure may not be only a wrong answer, but also a wrong action.

Core idea

AI application security is not only about preventing bad answers. It is about controlling data exposure, permissions, tool access, execution, monitoring and responsibility.

Related foundation

If you are still deciding how much security a specific AI use case needs, start with Before You Start with AI: Value, Risk, and Safe Use. This article continues from that broader risk model into production architecture for LLM, RAG and agentic systems.

Frameworks such as OWASP Top 10 for LLM Applications, MITRE ATLAS and NIST AI RMF with the Generative AI Profile are useful because they give teams shared language for AI-specific risks. They should not be treated as paperwork. Their practical value is that they force the team to ask where the AI system receives instructions, which data it can access, which tools it can invoke and how a failure would be detected.

The key question is not only what the AI says. The key question is what the AI can see, what it can call, what it can change, what it can trigger, and who is responsible when it fails. This is especially important when AI moves from a chat interface into a RAG system, employee assistant, workflow automation or agentic runtime. At that point the model becomes one component inside an operational system.

Threat modeling before implementation

AI security should start with system architecture, not with a guardrail framework. Guardrails can reduce specific risks, but they cannot compensate for a system that gives the model too much data, too much authority or too many irreversible actions.

Start with protected assets. These can include internal data, customer data, documents, accounts, API keys, source code, business know-how, database records, financial actions and operational workflows. Then map inputs: user messages, uploaded files, PDFs, e-mails, websites, API responses, screenshots, tool outputs and messages from other agents. Each input should be treated as potentially untrusted until the application has classified it.

The next step is to draw trust boundaries. A common chain is user to application, application to prompt builder, prompt builder to RAG, RAG to model, model to tools, tools to external systems and the whole workflow into logs and monitoring. The important part is not the drawing. The important part is seeing where data crosses from one authority to another.

User / external input
  ->
Application backend
  ->
Prompt builder
  ->
RAG / context retrieval
  ->
LLM
  ->
Guardrails / policy layer
  ->
Tools / APIs / database / e-mail / files
  ->
Logs / monitoring / review / rollback

Permissions should be explicit. What can the model see? What can it suggest? What can it call? What can it change? A model that drafts a support answer has a different risk profile than an agent that can send e-mail, delete a database record or create a production deployment. This is where security depth should match the value, data exposure, permissions and operational reach of the use case. For a broader framing of that tradeoff, see AI value vs security.

Methods such as STRIDE and LINDDUN can help structure this work. They do not need to become a long theoretical exercise. For many AI products, a short workshop with architecture, data, permissions, abuse cases and incident response is already enough to catch the largest design mistakes before code is written.

Practical threat-modeling checklist

What data can the model see?
Is the data filtered by the user's permissions before it reaches the model?
What tools can the model call?
Can the model modify state or only suggest actions?
Are actions reversible?
Is there a human approval step?
What gets logged?
Who owns the workflow?

Prompt injection and instruction/data confusion

Prompt injection is the most visible AI application risk because it demonstrates the central problem clearly: the model can receive instructions from places the application did not intend to trust. Direct prompt injection happens when a user tries to override the task directly. Indirect prompt injection happens when malicious instructions are hidden in a document, website, e-mail, image, PDF or tool output that later becomes part of the model context.

The phrase "ignore previous instructions" is not just a meme. It is a symptom of a deeper issue: LLMs do not enforce security boundaries the way an operating system, database or backend policy engine does. They process instructions and data together in a probabilistic context. A system prompt can guide behavior, but it is not a security boundary. It is application logic that can be attacked, bypassed or misunderstood.

This matters even more for multimodal applications. Text embedded in a screenshot, white text in an image, instructions inside a scanned document or hidden content in a web page can become part of the model's input. If the model then summarizes the content, calls a tool or forwards the result to another system, the malicious instruction can influence downstream behavior.

Important boundary

Never send the model secrets it should not know. If a secret, credential, private key or sensitive record is placed in the prompt, the system has already lost an important security boundary.

Good defenses start with separation. System instructions, user input and external content should be represented separately in the application. Retrieved content should be treated as data, not as new instructions. Outputs should be validated before they are used downstream. Risky actions should not be executed directly by the model. Tool access should be limited to the specific task. Suspicious prompts, blocked actions and failed attacks should be logged.

The correct goal is not to claim that prompt injection is solved. The goal is to reduce likelihood, reduce impact, detect abuse, limit permissions and preserve rollback. When a prompt injection failure is discovered, it should become a regression test. Otherwise, the same class of failure will return after a prompt change, model upgrade, RAG change or tool definition update.

RAG security and document access control

Retrieval-augmented generation does not remove the need for access control. It adds a retrieval layer between the user, source systems, vector indexes and the model context. In a business environment, that layer may work with customer records, project files, meeting notes, policies, support tickets, contracts, source code and technical documentation. If the retrieval layer can place protected content into the prompt, it has become part of the application's authorization boundary.

User request
  ->
Authentication / Authorization
  ->
Resolve user, tenant, roles and data permissions
  ->
Apply security filters before retrieval
  ->
Retrieve only authorized chunks from the index
  ->
Build model context with citations and source metadata
  ->
LLM generates an answer
  ->
Output validation, audit log and source trace

The most important rule is simple: RAG must enforce the same access rules as the original source systems. If a user cannot open a document in the source system, the AI should not be able to use it in the answer. This control must happen before retrieval, not after the answer is generated. Once forbidden context has been placed into the prompt, the system has already exposed it to the model.

RAG access principle

RAG must enforce the same access rules as the original source systems. If a user cannot open a document in the source system, the AI should not be able to use it in the answer.

Security trimming should be part of every query. Each chunk should carry authorization metadata such as tenant, owner, role, sensitivity, source, timestamp and document ID. Retrieval should filter by that metadata before semantic search returns content to the prompt builder. Embeddings without authorization metadata are a long-term operational problem because the system loses the link between meaning and permission.

RAG also creates ingestion risks. A poisoned document can enter the knowledge base and influence answers. A document may contain hidden instructions for the model. Retrieval may return semantically similar but security-inappropriate context. A deleted document or a document with changed permissions may remain available in the vector index. Cross-tenant leakage can happen between customers, departments, projects or teams if metadata and filtering are not designed carefully.

Before retrieval

Authenticate the user, resolve their tenant, roles and data permissions, then query only indexes and chunks they are allowed to use.

During ingestion

Scan documents, attach metadata, preserve source IDs, classify sensitivity and separate public, internal, confidential and customer-specific sources.

During answer generation

Cite sources, keep retrieved content distinct from instructions, avoid unsupported claims and log which chunks were used.

During lifecycle changes

Reindex after permission changes, remove deleted content, enforce retention and test cross-tenant leakage regularly.

Source citation is not only a usability feature. It gives users and reviewers a way to inspect why an answer was produced. Logging used chunks helps incident response and debugging. If a user received confidential information, the team must be able to identify which chunk, document, source connector or permission rule allowed it.

Tool calling and agent permissions

Tool calling is the point where an AI application stops being only a text interface and becomes an operational system. The model does not merely answer. It proposes that a function, API, workflow, database query, browser action, file operation or external message should be executed. That makes tool calling one of the most important parts of AI application security.

The professional security model is simple: the model may propose an action, but it must not be the security authority. The backend must validate the tool name, parameters, user permissions, tenant scope, business rules, rate limits, approval state and audit requirements before anything is executed. In other words, tool calling should be treated like an untrusted request to a privileged backend operation.

User request
  ->
LLM proposes a tool call
  ->
Tool registry checks whether the tool exists
  ->
Schema validation checks parameters
  ->
Permission and tenant policy checks authority
  ->
Business rules check whether the action is allowed
  ->
Approval layer handles high-risk actions
  ->
Backend executes with scoped credentials
  ->
Structured result returns to the model
  ->
Audit log, trace, monitoring and rollback path

Core rule for tool calling

Never design production tool calling as "LLM decides, backend executes". Design it as "LLM proposes, backend validates, policy authorizes, humans approve when needed, backend executes and logs".

Tool definitions are part of the security surface. A vague tool such as get_data, run_sql or call_apigives the model too much ambiguity and gives the system too little control. Prefer domain-specific tools with clear names, strict schemas and narrow purpose: orders_get_status, email_create_draft, email_send, billing_prepare_refund and billing_execute_refund. Namespacing helps the model select the right capability and helps the security layer apply the right policy.

Every tool should be classified by risk. A read-only tool that returns an order status is not the same as a tool that sends an e-mail, changes a price, refunds a payment, deletes a record or deploys code. A useful baseline is to separate tools into read-only, draft or prepare, write, external communication and destructive actions. Each category should have different approval, logging and rollback requirements.

Read-only tools

Use for retrieval, status checks, calculations and knowledge lookup. Still enforce user and tenant permissions before data is returned to the model.

Prepare tools

Use to create drafts, proposed changes, suggested replies or approval requests. They should not perform the final external or irreversible action.

Write tools

Use only through strict schemas, business rules, idempotency keys, rate limits and clear ownership of the resulting state change.

External and destructive tools

Sending e-mail, payments, deletion, deployments, permission changes and production operations should require approval and complete audit evidence.

The safest pattern for risky actions is to split preparation from execution. For example, an assistant may call email_create_draft automatically, but email_send should require a user approval tied to the exact recipient, subject and body that were approved. The same pattern applies to refunds, invoices, order changes, price changes, deployments and database mutations.

Risky action pattern

prepare_change
  -> creates a draft or proposed operation
  -> user or policy approval captures an immutable snapshot
  -> execute_change runs only the approved operation
  -> audit log records who approved, what changed and when

Tool arguments must be validated as untrusted input. Use strict JSON schemas, runtime validation, enums, length limits, ID formats and additionalProperties: false where possible. Never build SQL queries, shell commands, file paths or HTTP requests by directly interpolating model-generated strings. A tool call can carry injection payloads just like a normal web request.

Tool outputs also need control. Return only the fields the model needs, not complete database records. Do not return password hashes, access tokens, internal notes, excessive PII or raw stack traces. External tool output, including web pages, e-mails, PDFs, tickets and CRM notes, should be treated as untrusted data. It can contain instructions aimed at the model, so it must not be allowed to change the system prompt, tool policy or approval rules.

Idempotency and retry behavior matter. Read-only tools can usually be retried. Write tools should use idempotency keys so a network timeout or repeated model call does not send the same e-mail twice, create duplicate tickets, issue duplicate refunds or repeat an order change. Destructive tools should have limited retry behavior and a rollback or compensation path where possible.

Agent loops need hard boundaries. Define maximum steps, maximum tool calls, maximum runtime, maximum retry count, maximum daily cost and maximum number of write or external actions. Without these limits, an agent can get stuck in a loop, overload internal systems, spend unexpected API budget or repeatedly attempt an unsafe action.

Tool-calling systems fail in subtle ways. The agent may use the right tool in the wrong context. It may execute an action with excessive permissions. Indirect prompt injection may make the agent send data outside the system. Tool output poisoning may change the next step. Memory poisoning may influence future decisions. An agent with broad permissions can amplify an incident through its own access. The highest-risk failures are silent irreversible actions.

A professional tool registry should document the tool name, version, owner, purpose, schema, risk level, required permissions, tenant scope, timeout, rate limit, approval policy, logging policy and data classification. That registry can be implemented directly in application code, in an internal service, or through a protocol such as MCP. MCP can standardize how tools and resources are exposed to agents, but it does not replace permission checks, business rules, tenant isolation, approval, audit and monitoring.

Human approval should be required for high-risk actions such as e-mail sending, payments, deletion, database changes, deployments, permission changes and external communication. This does not make the system less useful. It often makes the workflow more usable because people can trust that the AI prepares work without silently crossing operational boundaries. See AI automation for why process clarity matters before automation, and AI assistants for employee workflows where human-in-the-loop design is the right default.

Experimental agents should run in isolated environments. Runtime isolation, controlled execution and deployment boundaries matter when agents can read files, call APIs or run scheduled tasks. Tools such as execute_shell, run_python, browser_click, read_file, write_file, git_push and deploy_to_production should be treated as high-risk capabilities and isolated with separate accounts, containers, sandboxes or virtual machines. The AI automation server article expands on that operational environment.

Tool-calling checklist

Use domain-specific tool wrappers instead of exposing raw APIs.
Keep read, prepare, write, external and destructive tools separate.
Validate tool arguments outside the model with strict schemas.
Enforce user, role, tenant and capability permissions in backend code.
Use scoped server-side credentials; never send secrets to the model.
Minimize tool outputs and mark external content as untrusted data.
Use human approval for external, financial, destructive and production actions.
Use idempotency keys for write actions and safe retry policies.
Limit steps, runtime, retries, cost and number of write actions.
Log every tool call with validated arguments, result, user, session and approval status.
Version tool schemas and run evals for tool selection and argument correctness.
Provide a way to disable risky tools quickly during an incident.

Guardrails and policy controls

Guardrails are useful, but they are only one layer. They can inspect inputs, control context, validate outputs, block forbidden content, detect PII, enforce schemas, check groundedness and apply business rules. They are most effective when the rest of the architecture is already designed with access control, least privilege and backend validation.

Input guardrails can classify requests, detect obvious abuse and reject unsupported formats. Context guardrails can prevent sensitive or unrelated content from being added to the prompt. Output guardrails can check schema, tone, forbidden categories, source usage and sensitive data. Action guardrails can inspect tool requests before execution. PII detection and redaction can reduce exposure when sensitive data is not required for the task.

Guardrail role

Guardrails reduce risk. They do not replace access control, least privilege, backend validation or human approval.

Tools such as NeMo Guardrails, LlamaGuard, Guardrails AI, Presidio and custom policy checks can help, but the tool choice is not the main architectural decision. The main question is where the control is enforced and what happens after a block. A guardrail that only changes the final text is not enough if the model has already received data it should not see or has already proposed a dangerous tool call.

Treat guardrails as observable controls. Log what they blocked, why they blocked it and which prompt, retrieval result or tool call was involved. If a guardrail blocks something legitimate, improve the workflow. If it blocks an attack, add it to the evaluation suite.

Data protection, privacy and secrets

AI security is also data minimization. Do not send sensitive data to the model unless it is necessary for the task. Redact or mask PII where possible. Separate tenants and customer environments. Encrypt data in transit and at rest. Define prompt and log retention policies before production usage begins.

The safest sensitive data is the data that never reaches the model. This is especially important for API keys, passwords, private keys, production credentials, database dumps, personal data and regulated customer records. These should not be placed in prompts, logs, test fixtures or agent memory. If a workflow needs credentials, the backend should use controlled secret storage and scoped tokens, not natural-language prompt context.

Observability must be balanced with privacy. Teams need enough logs to investigate incidents, debug retrieval, review tool calls and understand failures. At the same time, logs must not become a new sensitive-data leak. This requires retention limits, access control, redaction and auditing of who inspected AI traces.

Compliance requirements depend on the data and jurisdiction, but the technical baseline is consistent: know what data enters the AI layer, who is allowed to access it, where it is stored, how long it is retained and how it can be deleted. Audit who accessed what data through the AI layer, not only through the source application.

Secure AI development lifecycle and CI/CD evaluations

Production AI systems need repeatable tests because important behavior is not defined only in application code. Prompts, model versions, retrieval settings, tool definitions and guardrail rules all influence what the system sees, says and does. Treat these parts as versioned application configuration: review them in pull requests, track what changed and test the effect before release.

This matters because a small prompt edit, retrieval tuning change or model upgrade can change refusals, tool selection, groundedness, source usage and sensitivity to injected instructions. A workflow that behaved safely last month should not be assumed to behave the same way after the AI layer changes.

Keep evaluation suites in the repository and run AI-specific tests in pull requests. Those tests should cover prompt injection, indirect prompt injection, data leakage, cross-tenant access, unsafe tool calls, groundedness, source usage and expected refusals.

Pull request
  ->
Unit tests
  ->
Prompt / RAG / tool evaluations
  ->
Security abuse tests
  ->
Human review
  ->
Staging
  ->
Monitored production rollout

Tools such as promptfoo, garak, PyRIT, custom eval suites and GitHub Actions can support this process. The exact stack matters less than repeatability. Every discovered AI failure should become a regression test. If a user finds a prompt injection, if a document leaks across tenants, if a tool call is allowed with the wrong parameters, or if the model answers without sources, the fix should be captured in tests.

Security gates should run before deployment. Staging should use representative prompts, realistic retrieval data and realistic tool permissions without exposing production secrets. Production rollout should be monitored, especially after changing the model, prompt, retrieval index, guardrails or tool schema.

Observability, audit and incident response

Classic logs are not enough for AI systems. For security and debugging, teams need traces that show what happened around the model. A useful trace can include user and session, prompt template version, model and model version, retrieved context, tool calls, tool parameters, guardrail decisions, blocked actions, final output, latency, cost, rate anomalies and repeated attack attempts.

Centralized traces make investigation possible. Alerting should detect suspicious patterns such as repeated prompt injection attempts, unusual retrieval volume, repeated blocked tool calls, abnormal cost, unusual latency or access to sensitive document classes. Enterprise systems may need SIEM integration so AI events are visible beside other security telemetry.

Incident response should be designed before the first incident. The team should be able to disable a risky tool quickly, roll back a prompt or model change, rotate exposed credentials and identify affected users or documents. Post-incident work should include a regression test, not only a patch.

Incident response playbook

Detect suspicious input, output or tool call.
Identify user, session and affected data.
Check retrieved context and tool calls.
Disable risky tool or workflow if needed.
Rotate exposed credentials if needed.
Fix policy, prompt, retrieval or tool validation.
Add regression test.
Document the incident.

Privacy balance

Log enough for investigation, but avoid creating excessive sensitive-data retention. AI observability should help security without becoming another uncontrolled data store.

Model and supply-chain security

An AI application depends on more than application code. It also depends on models, prompts, embeddings, datasets, vector indexes, tools and evaluation suites. This means supply-chain review should include AI artifacts, not only npm packages, containers and backend dependencies.

Teams should know model provenance, license constraints and production approval status. Open-source models can be valuable, but they should be reviewed like other third-party artifacts. Dataset provenance, embedding model choice, vector index management and unsafe serialization formats all belong in the risk review. Dependency scanning should continue, and larger organizations may document AI artifacts in an AIBOM or extended SBOM.

CI/CD should block unapproved models or unknown artifacts in production workflows. This does not need to dominate every AI project, but it matters when AI becomes part of customer-facing, regulated or operational systems. The goal is to know what the system is made of and to prevent accidental introduction of unknown models, indexes or tools.

Governance and ownership

Technical controls need ownership. Without ownership, AI security becomes nobody's job. Every production AI use case should have a use-case owner, data owner, risk classification, approved models, approved tools, human oversight model, review process and incident responsibility.

Governance should not turn the article into a legal compliance checklist. Its practical purpose is to keep the system understandable as it changes. Who can approve a new tool? Who decides whether a new document source can be indexed? Who reviews logs? Who accepts the risk of automation? Who responds if the assistant leaks data or performs the wrong action?

NIST AI RMF and the Generative AI Profile can help teams structure governance. Some organizations will also need to consider regulatory frameworks such as the EU AI Act, depending on the use case. For most technical teams, the immediate value is simpler: document key decisions, review prompts and tools periodically, check permissions, inspect logs and make sure humans know when they remain responsible.

Reference architecture for a secure AI application

A secure AI application treats the LLM as one component in a controlled system, not as the uncontrolled center of the system. The backend remains the authority. Access control happens before data is retrieved. Tool execution happens through policy and backend code. High-risk actions require approval. Logs make behavior traceable.

User
  ->
Authentication / Authorization
  ->
Application backend
  ->
Input validation
  ->
Prompt builder
  ->
RAG retrieval with security trimming
  ->
LLM gateway
  ->
Output validation / guardrails
  ->
Tool policy layer
  ->
Human approval for high-risk actions
  ->
Execution by backend services
  ->
Audit logs / traces / monitoring
  ->
Incident response / rollback

Authentication and authorization identify the user and enforce permissions. The backend keeps authority outside the model. Input validation classifies and validates requests. The prompt builder structures instructions and context safely. RAG with security trimming retrieves only allowed context. The LLM gateway manages models, rate limits and provider configuration.

Output validation and guardrails check format, safety and sensitive data. The tool policy layer decides which actions are allowed. Human approval protects risky or irreversible actions. Backend services execute the action using controlled application logic. Audit logs and monitoring make the behavior traceable. Incident response and rollback make recovery possible when something goes wrong.

Architecture principle

The LLM should be one component in a controlled system, not the uncontrolled center of the system.

Final pre-production checklist

The goal is not to make AI risk-free. The goal is to make AI systems bounded, observable, testable and recoverable. A production AI application should create value without becoming an uncontrolled actor inside the organization.

Architecture

Do we know what the AI can see, call and change?
Are trust boundaries documented?
Is there a clear owner?
Is the model only one component in a controlled backend architecture?

Data and RAG

Are documents filtered before retrieval?
Are permissions enforced per user and tenant?
Are retrieved chunks logged?
Are poisoned documents considered?
Is there a lifecycle process for deleted or permission-changed documents?

Agents and tools

Are tools domain-specific wrappers rather than raw APIs?
Are read, prepare, write, external and destructive tools separated?
Are tool parameters validated outside the model with strict schemas?
Are tools scoped with least privilege, tenant boundaries and server-side credentials?
Do high-risk actions require approval tied to an immutable action snapshot?
Are write tools idempotent and protected against unsafe retries?
Are tool outputs minimized and external content marked as untrusted data?
Are step, runtime, retry, cost and write-action limits enforced?
Can risky tools be disabled quickly?

Testing

Do we test prompt injection?
Do we test indirect prompt injection?
Do we test data leakage?
Do we test cross-tenant access?
Do we test unsafe tool calls?
Do we run evals in CI/CD?
Does every discovered failure become a regression test?

Operations

Are prompts, model versions, retrieved context and tool calls observable?
Are blocked actions logged?
Is there an incident playbook?
Can we roll back?
Are logs protected from becoming a new data leak?

Governance

Is there a use-case owner?
Is there a data owner?
Are approved models and tools documented?
Is there a review process?
Is human oversight defined where needed?

Related AI guides

Before You Start with AI: Value, Risk, and Safe UseHow to match security depth to value, data exposure, permissions and operational reach.Before AI Automation: Process First, AI SecondWhy process clarity matters before automated AI workflows are connected to real operations.AI Assistant for Human WorkHow to design employee AI assistants around context, permissions and human approval.AI Automation ServerHow runtime isolation and controlled execution help when AI agents run operational tasks.

AI Application Security: Technical Best Practices