MR-010 Security & adversarial System scope

Prompt injection and jailbreaking

Adversarial inputs (prompt injection, jailbreaks, goal hijacking, prompt leaking) bypass instructions or safety controls.

Risk family: Security & adversarial
MIT domain: 2. Privacy & Security
MIT subdomain: 2.2 > AI system security vulnerabilities and attacks
AI type: GPAI, Agentic
Scope: System
Source standard: MIT AI Risk Repository v4

Provenance

Source standard

MIT AI Risk Repository v4

Source frameworks

11 source framework citation keys

Anwar2024, Cui2024, G'sell2024, Gabriel2024, Gipiškis2024, Hagendorff2024, IBM2025, Marchal2024, Nah2023, Sun2023, Wang2025

ISO/IEC references

23894 obj A.11; src 7; mech B.5 | 42001 ctrl A.6.2.4, A.6.2.6

Framework crosswalk

Every framework item mapped to this risk. Items marked partial overlap only in part; definitions appear on hover where the source licence permits.

Sourcesframeworks that contributed to the register

ISO 238941

A.11 ISO/IEC 23894 Annex A A.11

ISO 420012

A.6.2.4 ISO/IEC 42001 Annex A A.6.2.4
A.6.2.6 ISO/IEC 42001 Annex A A.6.2.6

MITRE ATLAS14

Expanded into this risk’s technique sub-risks.

Cross-checksframeworks mapped in to test coverage

IBM9

ibm-context-overload-attack Context overload attack
ibm-direct-instructions-attack Direct instructions attack
ibm-encoded-interactions-attack Encoded interactions attack
ibm-indirect-instructions-attack Indirect instructions attack
ibm-jailbreaking Jailbreaking
ibm-prompt-injection-attack Prompt injection attack
ibm-prompt-priming Prompt priming
ibm-social-hacking-attack Social hacking attack
ibm-specialized-tokens-attack Specialized tokens attack

Cisco19

AISubtech-1.1.1 Instruction Manipulation (Direct Prompt Injection)
AISubtech-1.1.2 Obfuscation (Direct Prompt Injection)
AISubtech-1.1.3 Multi-Agent Prompt Injection
AISubtech-1.2.1 Instruction Manipulation (Indirect Prompt Injection)
AISubtech-1.2.2 Obfuscation (Indirect Prompt Injection)
AISubtech-1.2.3 Multi-Agent (Indirect Prompt Injection)
AISubtech-1.4.1 Image-Text Injection
AISubtech-1.4.2 Image Manipulation
AISubtech-1.4.3 Audio Command Injection
AISubtech-1.4.4 Video Overlay Manipulation
AISubtech-19.1.1 Contradictory Inputs Attack partial
AISubtech-19.1.2 Modality Skewing partial
AISubtech-19.2.1 Convergence Payload Injection partial
AISubtech-19.2.2 Chained Payload Execution partial
AISubtech-2.1.1 Context Manipulation (Jailbreak)
AISubtech-2.1.2 Obfuscation (Jailbreak)
AISubtech-2.1.3 Semantic Manipulation (Jailbreak)
AISubtech-2.1.4 Token Exploitation (Jailbreak)
AISubtech-2.1.5 Multi-Agent Jailbreak Collaboration

NIST AML5

NISTAML.015 Indirect Prompt Injection
NISTAML.018 Prompt Injection
NISTAML.02 Integrity Violations
NISTAML.027 Misaligned Outputs
NISTAML.04 Misuse Violations

OWASP LLM2

LLM01:2025 Prompt Injection
LLM08:2025 Vector and Embedding Weaknesses partial

OWASP Agentic2

ASI01 Agent Goal Hijack
ASI06 Memory and Context Poisoning

Sub-risks (10)

Technique-level decompositions of this risk, each anchored to the MITRE ATLAS technique it derives from.

MR-010.1

Prompt injection of the deployed LLM

Malicious instructions in user input or retrieved content cause the LLM to ignore its intended task and act on the attacker's instructions.

MITRE ATLAS technique: AML.T0051 LLM Prompt Injection

MR-010.2

Jailbreak and safety-guardrail bypass

Crafted inputs make the model ignore, circumvent, or override its safety restrictions.

MITRE ATLAS technique: AML.T0054 LLM Jailbreak

MR-010.3

Self-replicating prompt injection

A prompt-injection payload is crafted to copy itself onward, spreading across messages, documents, or agents.

MITRE ATLAS technique: AML.T0061 LLM Prompt Self-Replication

MR-010.4

Manipulation of trusted output components

Prompts cause the model to manipulate citations, links, or UI components users trust, masking malicious content.

MITRE ATLAS technique: AML.T0067 LLM Trusted Output Components Manipulation

MR-010.5

Obfuscated prompt injection evading filters

Injected instructions are encoded or hidden so they evade input and content filters.

MITRE ATLAS technique: AML.T0068 LLM Prompt Obfuscation

MR-010.6

Retrieval-augmented generation (RAG) poisoning

Malicious content is injected into the knowledge base a RAG system retrieves from, steering answers and actions.

MITRE ATLAS technique: AML.T0070 RAG Poisoning

MR-010.7

False RAG entry injection

Fabricated entries are introduced into the retrieval store so the model surfaces attacker-controlled information.

MITRE ATLAS technique: AML.T0071 False RAG Entry Injection

MR-010.8

Tampering with user chat history

An attacker alters the conversation history the model relies on to cover tracks or steer behavior.

MITRE ATLAS technique: AML.T0092 Manipulate User LLM Chat History

MR-010.9

Indirect prompt injection via a public-facing surface

Malicious prompts are planted in content the system ingests (web pages, documents, tickets) and execute when processed.

MITRE ATLAS technique: AML.T0093 Prompt Infiltration via Public-Facing Application

MR-010.10

Delayed or triggered prompt instructions

Injected instructions lie dormant and execute on a later trigger or future interaction.

MITRE ATLAS technique: AML.T0094 Delay Execution of LLM Instructions

More in Security & adversarial

MR-012 MR-014 MR-015 MR-016 MR-020 MR-071

Part of the Deployer AI Risk Register, an open-source resource developed by MindXO. Version 1.0, 3 July 2026. Derived from the MIT AI Risk Repository (V4, December 2025) under CC BY 4.0; an independent derivative work, not endorsed by or affiliated with MIT. Sub-risk decomposition references MITRE ATLAS™ v5.6.0 (© 2021-2026 The MITRE Corporation, reproduced and distributed with permission). ISO/IEC and EU AI Act references are by number only. License: CC BY 4.0. Full attribution and licensing.