DARR
MR-010 Security & adversarial System scope

Prompt injection and jailbreaking

Adversarial inputs (prompt injection, jailbreaks, goal hijacking, prompt leaking) bypass instructions or safety controls.

Risk family
Security & adversarial
MIT domain
2. Privacy & Security
MIT subdomain
2.2 > AI system security vulnerabilities and attacks
AI type
GPAI, Agentic
Scope
System
Source standard
MIT AI Risk Repository v4

Provenance

Source standard
MIT AI Risk Repository v4
Source frameworks
11 source framework citation keys
Anwar2024, Cui2024, G'sell2024, Gabriel2024, Gipiškis2024, Hagendorff2024, IBM2025, Marchal2024, Nah2023, Sun2023, Wang2025
ISO/IEC references
23894 obj A.11; src 7; mech B.5 | 42001 ctrl A.6.2.4, A.6.2.6

Framework crosswalk

Every framework item mapped to this risk. Items marked partial overlap only in part; definitions appear on hover where the source licence permits.

Sourcesframeworks that contributed to the register
ISO 238941
  • A.11 ISO/IEC 23894 Annex A A.11
ISO 420012
  • A.6.2.4 ISO/IEC 42001 Annex A A.6.2.4
  • A.6.2.6 ISO/IEC 42001 Annex A A.6.2.6
MITRE ATLAS14

Expanded into this risk’s technique sub-risks.

Cross-checksframeworks mapped in to test coverage
IBM9
  • ibm-context-overload-attack Context overload attack
  • ibm-direct-instructions-attack Direct instructions attack
  • ibm-encoded-interactions-attack Encoded interactions attack
  • ibm-indirect-instructions-attack Indirect instructions attack
  • ibm-jailbreaking Jailbreaking
  • ibm-prompt-injection-attack Prompt injection attack
  • ibm-prompt-priming Prompt priming
  • ibm-social-hacking-attack Social hacking attack
  • ibm-specialized-tokens-attack Specialized tokens attack
Cisco19
  • AISubtech-1.1.1 Instruction Manipulation (Direct Prompt Injection)
  • AISubtech-1.1.2 Obfuscation (Direct Prompt Injection)
  • AISubtech-1.1.3 Multi-Agent Prompt Injection
  • AISubtech-1.2.1 Instruction Manipulation (Indirect Prompt Injection)
  • AISubtech-1.2.2 Obfuscation (Indirect Prompt Injection)
  • AISubtech-1.2.3 Multi-Agent (Indirect Prompt Injection)
  • AISubtech-1.4.1 Image-Text Injection
  • AISubtech-1.4.2 Image Manipulation
  • AISubtech-1.4.3 Audio Command Injection
  • AISubtech-1.4.4 Video Overlay Manipulation
  • AISubtech-19.1.1 Contradictory Inputs Attack partial
  • AISubtech-19.1.2 Modality Skewing partial
  • AISubtech-19.2.1 Convergence Payload Injection partial
  • AISubtech-19.2.2 Chained Payload Execution partial
  • AISubtech-2.1.1 Context Manipulation (Jailbreak)
  • AISubtech-2.1.2 Obfuscation (Jailbreak)
  • AISubtech-2.1.3 Semantic Manipulation (Jailbreak)
  • AISubtech-2.1.4 Token Exploitation (Jailbreak)
  • AISubtech-2.1.5 Multi-Agent Jailbreak Collaboration
NIST AML5
  • NISTAML.015 Indirect Prompt Injection
  • NISTAML.018 Prompt Injection
  • NISTAML.02 Integrity Violations
  • NISTAML.027 Misaligned Outputs
  • NISTAML.04 Misuse Violations
OWASP LLM2
  • LLM01:2025 Prompt Injection
  • LLM08:2025 Vector and Embedding Weaknesses partial
OWASP Agentic2
  • ASI01 Agent Goal Hijack
  • ASI06 Memory and Context Poisoning

Sub-risks (10)

Technique-level decompositions of this risk, each anchored to the MITRE ATLAS technique it derives from.

MR-010.1

Prompt injection of the deployed LLM

#

Malicious instructions in user input or retrieved content cause the LLM to ignore its intended task and act on the attacker's instructions.

MITRE ATLAS technique: AML.T0051 LLM Prompt Injection
MR-010.2

Jailbreak and safety-guardrail bypass

#

Crafted inputs make the model ignore, circumvent, or override its safety restrictions.

MITRE ATLAS technique: AML.T0054 LLM Jailbreak
MR-010.3

Self-replicating prompt injection

#

A prompt-injection payload is crafted to copy itself onward, spreading across messages, documents, or agents.

MITRE ATLAS technique: AML.T0061 LLM Prompt Self-Replication
MR-010.4

Manipulation of trusted output components

#

Prompts cause the model to manipulate citations, links, or UI components users trust, masking malicious content.

MITRE ATLAS technique: AML.T0067 LLM Trusted Output Components Manipulation
MR-010.5

Obfuscated prompt injection evading filters

#

Injected instructions are encoded or hidden so they evade input and content filters.

MITRE ATLAS technique: AML.T0068 LLM Prompt Obfuscation
MR-010.6

Retrieval-augmented generation (RAG) poisoning

#

Malicious content is injected into the knowledge base a RAG system retrieves from, steering answers and actions.

MITRE ATLAS technique: AML.T0070 RAG Poisoning
MR-010.7

False RAG entry injection

#

Fabricated entries are introduced into the retrieval store so the model surfaces attacker-controlled information.

MITRE ATLAS technique: AML.T0071 False RAG Entry Injection
MR-010.8

Tampering with user chat history

#

An attacker alters the conversation history the model relies on to cover tracks or steer behavior.

MITRE ATLAS technique: AML.T0092 Manipulate User LLM Chat History
MR-010.9

Indirect prompt injection via a public-facing surface

#

Malicious prompts are planted in content the system ingests (web pages, documents, tickets) and execute when processed.

MITRE ATLAS technique: AML.T0093 Prompt Infiltration via Public-Facing Application
MR-010.10

Delayed or triggered prompt instructions

#

Injected instructions lie dormant and execute on a later trigger or future interaction.

MITRE ATLAS technique: AML.T0094 Delay Execution of LLM Instructions

More in Security & adversarial

Part of the Deployer AI Risk Register, an open-source resource developed by MindXO. Version 1.0, 3 July 2026. Derived from the MIT AI Risk Repository (V4, December 2025) under CC BY 4.0; an independent derivative work, not endorsed by or affiliated with MIT. Sub-risk decomposition references MITRE ATLAS™ v5.6.0 (© 2021-2026 The MITRE Corporation, reproduced and distributed with permission). ISO/IEC and EU AI Act references are by number only. License: CC BY 4.0. Full attribution and licensing.