MR-046 Governance & process Organization scope

Inadequate evaluation, testing and benchmarking

Evaluation/testing is incomplete or unrepresentative (e.g. benchmark contamination, missing safety evals), giving false assurance.

Risk family: Governance & process
MIT domain: 6. Socioeconomic and Environmental
MIT subdomain: 6.5 > Governance failure
AI type: GPAI, Classical_ML, Agentic
Scope: Organization
Source standard: MIT AI Risk Repository v4

Provenance

Source standard

MIT AI Risk Repository v4

Source frameworks

Gabriel2024, Gipiškis2024, IBM2025

ISO/IEC references

23894 obj A.9; src 2; mech B.8 | 42001 ctrl A.6.2.4

EU AI Act articles

Art. 9 | Art. 15

GPAI Code of Practice

S&S Ch. Commitments 2-5

Framework crosswalk

Every framework item mapped to this risk. Items marked partial overlap only in part; definitions appear on hover where the source licence permits.

Sourcesframeworks that contributed to the register

ISO 238941

A.9 ISO/IEC 23894 Annex A A.9

ISO 420011

A.6.2.4 ISO/IEC 42001 Annex A A.6.2.4

EU AI Act3

Art. 15
Art. 9
CoP S&S Ch. Commitments 2-5

Cross-checksframeworks mapped in to test coverage

IBM5

ibm-incomplete-ai-agent-evaluation Incomplete AI agent evaluation
ibm-incorrect-risk-testing Incorrect risk testing
ibm-lack-of-testing-diversity Lack of testing diversity
ibm-reproducibility Reproducibility partial
ibm-unrepresentative-risk-testing Unrepresentative risk testing

More in Governance & process

MR-037 MR-042 MR-043 MR-045 MR-054 MR-062 MR-063 MR-066 MR-067 MR-068 MR-069 MR-070

Part of the Deployer AI Risk Register, an open-source resource developed by MindXO. Version 1.0, 3 July 2026. Derived from the MIT AI Risk Repository (V4, December 2025) under CC BY 4.0; an independent derivative work, not endorsed by or affiliated with MIT. Sub-risk decomposition references MITRE ATLAS™ v5.6.0 (© 2021-2026 The MITRE Corporation, reproduced and distributed with permission). ISO/IEC and EU AI Act references are by number only. License: CC BY 4.0. Full attribution and licensing.