DARR
MR-046 Governance & process Organization scope

Inadequate evaluation, testing and benchmarking

Evaluation/testing is incomplete or unrepresentative (e.g. benchmark contamination, missing safety evals), giving false assurance.

Risk family
Governance & process
MIT domain
6. Socioeconomic and Environmental
MIT subdomain
6.5 > Governance failure
AI type
GPAI, Classical_ML, Agentic
Scope
Organization
Source standard
MIT AI Risk Repository v4

Provenance

Source standard
MIT AI Risk Repository v4
Source frameworks
Gabriel2024, Gipiškis2024, IBM2025
ISO/IEC references
23894 obj A.9; src 2; mech B.8 | 42001 ctrl A.6.2.4
EU AI Act articles
Art. 9 | Art. 15
GPAI Code of Practice
S&S Ch. Commitments 2-5

Framework crosswalk

Every framework item mapped to this risk. Items marked partial overlap only in part; definitions appear on hover where the source licence permits.

Sourcesframeworks that contributed to the register
ISO 238941
  • A.9 ISO/IEC 23894 Annex A A.9
ISO 420011
  • A.6.2.4 ISO/IEC 42001 Annex A A.6.2.4
EU AI Act3
  • Art. 15
  • Art. 9
  • CoP S&S Ch. Commitments 2-5
Cross-checksframeworks mapped in to test coverage
IBM5
  • ibm-incomplete-ai-agent-evaluation Incomplete AI agent evaluation
  • ibm-incorrect-risk-testing Incorrect risk testing
  • ibm-lack-of-testing-diversity Lack of testing diversity
  • ibm-reproducibility Reproducibility partial
  • ibm-unrepresentative-risk-testing Unrepresentative risk testing

Part of the Deployer AI Risk Register, an open-source resource developed by MindXO. Version 1.0, 3 July 2026. Derived from the MIT AI Risk Repository (V4, December 2025) under CC BY 4.0; an independent derivative work, not endorsed by or affiliated with MIT. Sub-risk decomposition references MITRE ATLAS™ v5.6.0 (© 2021-2026 The MITRE Corporation, reproduced and distributed with permission). ISO/IEC and EU AI Act references are by number only. License: CC BY 4.0. Full attribution and licensing.