AI Security

OWASP Top 10 for LLM Applications

An OWASP project that catalogs the most critical security risks specific to applications built on large language models, including Prompt Injection, Insecure Output Handling, Training Data Poisoning, Model Denial of Service, Supply-Chain Vulnerabilities, Sensitive Information Disclosure, Insecure Plugin/Tool Design, Excessive Agency, Overreliance, and Model Theft. It is the de facto framework for threat-modeling LLM applications and is referenced across AI security certifications such as AICP, AAISM, and AIGP.

Why It Matters

In practice, the OWASP LLM Top 10 is critical because it gives security teams a shared, LLM-specific vocabulary and checklist to threat-model systems that traditional web-app frameworks (the original OWASP Top 10) do not fully cover. It highlights uniquely generative-AI failure modes — for example Excessive Agency (an LLM agent granted more tools/permissions than it needs) and Insecure Output Handling (passing model output unsanitized into downstream code, SQL, or HTML, enabling XSS or injection). Teams use it to drive design reviews, red-team plans, and control selection for AI features. On exams such as AICP, AAISM, and AIGP, expect questions mapping a described scenario to the correct OWASP LLM category and selecting the matching control.

Related AI Security terms

Prompt Injection

An attack against large language model (LLM) applications in which crafted input manipulates the model into ignoring its original instructions or system prompt and performing attacker-controlled actions. Direct prompt injection embeds malicious instructions in user input ("ignore previous instructions and..."), while indirect prompt injection hides instructions in external content the model ingests (web pages, documents, emails) during retrieval or tool use. It ranks as the #1 risk in the OWASP Top 10 for LLM Applications. Prompt injection is a core topic in AI security and governance certifications such as AIGP, AICP, and AAISM.

Jailbreaking (LLM)

Techniques that bypass an AI model's safety guardrails and content policies to elicit prohibited outputs such as instructions for weapons, malware, or disallowed content. Common methods include role-play framing ("act as an unrestricted assistant"), obfuscation and encoding, many-shot priming, and adversarial suffixes discovered through optimization. Jailbreaking differs from prompt injection: jailbreaking targets the model's safety alignment, whereas prompt injection hijacks an application's surrounding instructions. It is central to red-teaming generative AI and appears in AICP, AIGP, and AAISM study domains.

Adversarial Examples

Inputs deliberately perturbed with small, often human-imperceptible changes that cause a machine learning model to misclassify them — for example altering a few pixels so an image classifier reads a stop sign as a speed-limit sign, or crafting audio that a voice assistant transcribes as a hidden command. Adversarial machine learning is the broader field studying such evasion attacks alongside poisoning and extraction across the ML lifecycle. NIST formalizes the taxonomy in NIST AI 100-2. Covered in AAISM, AICP, and AIGP.

Data Poisoning

An attack in which adversaries inject malicious or mislabeled data into a model's training set to degrade performance, cause targeted misclassifications, or implant a backdoor that activates on a specific trigger. Poisoning can target the pre-training corpus, fine-tuning data, or a retrieval (RAG) knowledge base. Because modern models train on large, often web-scraped datasets, even a small fraction of poisoned samples can have outsized effects. It appears in the OWASP Top 10 for LLM Applications and NIST AI 100-2, and is tested in AAISM, AICP, and AIGP.

Model Inversion Attack

A privacy attack that reconstructs sensitive training data, or attributes of it, by repeatedly querying a model and analyzing its outputs — for example recovering recognizable face images from a facial-recognition model or inferring private attributes of individuals in the training set. Model inversion undermines the confidentiality of the data a model was trained on and can breach privacy regulations such as GDPR. It is a key privacy risk in AI governance and is covered in AICP, AIGP, and AAISM.

Membership Inference Attack

A privacy attack that determines whether a specific data record was part of a model's training set, by exploiting differences in the model's confidence or behavior on data it has seen versus unseen data. It can reveal, for instance, that a particular person's medical record was used to train a model — a confidentiality breach in its own right. Membership inference is closely related to model inversion and is relevant to AICP, AIGP, and privacy-focused AI governance.