DARR
Reverse crosswalk

IBM AI Risk Atlas

99 entries, 99 mapped to canonical risks. Each entry below is shown with the canonical risk it maps to, or the reason it sits outside the register.

Framework entryDescriptionDispositionRegister mappingConfidenceNote
ibm-redundant-actions
Redundant actions
AI agents can execute actions that are not needed for achieving the goal. In an extreme case, AI agents might enter a cycle of executing the same actions repeatedly without any progress. This could happen because of unexpected conditions in the environment, the AI agent's failure to reflect on its action, AI agent reasoning and planning errors or the AI agent's lack of knowledge about the problem.
MappedPartialRedundant agent actions waste compute and energy and land on the environmental-footprint risk; pure operational cost efficiency is only thinly covered.
ibm-unexplainable-and-untraceable-actions
Unexplainable and untraceable actions
Explanations, lineage and trace information, and source attribution for AI agent actions might be difficult, imprecise or unobtainable.
MappedClearInability to explain or trace agent actions is the explainability risk applied to agents; the logging side sits on MR-069.
ibm-discriminatory-actions
Discriminatory actions
AI agents can take actions where one group of humans is unfairly advantaged over another due to the decisions of the model. This may be caused by AI agents' biased actions that impact the world, in the resources consulted, and in the resource selection process. For example, an AI agent can generate code that can be biased.
MappedClearAgent actions that unfairly advantage one group are biased or discriminatory decisions.
ibm-introduce-data-bias
Introduce data bias
Specific actions taken by the AI agent, such as modifying a dataset or a database, can introduce bias in the resource that gets used by others or by itself to take actions.
MappedPartialAn agent corrupting a shared dataset degrades data quality and representativeness for downstream users; the harm surfaces as MR-001.
ibm-ai-agent-compliance
AI agent compliance
Determining AI agents' compliance is complex and there might not be enough information to assess whether the agentic AI system is compliant with applicable legal requirements.
MappedPartialDifficulty assessing an agent's compliance is an AI governance and oversight gap; the exposure lands on MR-040.
ibm-accountability-of-ai-agent-actions
Accountability of AI agent actions
Assigning responsibility for an action taken by an agentic AI system is difficult due to the complexity of agents and the number of external resources, tools or agents they interact with.
MappedClearDifficulty assigning responsibility for an agent's actions is the unclear-accountability risk.
ibm-incomplete-ai-agent-evaluation
Incomplete AI agent evaluation
Evaluating the performance or accuracy or an agent is difficult because of system complexity and open-endedness.
MappedClearDifficulty evaluating agent performance is inadequate evaluation, testing and benchmarking.
ibm-lack-of-ai-agent-transparency
Lack of AI agent transparency
Lack of AI agent transparency is due to insufficient documentation of the AI agent design, development, evaluation process, absence of insights into the inner workings of the AI agent, and interaction with other agents/tools/resources.
MappedClearInsufficient agent design and evaluation documentation is the documentation and transparency risk.
ibm-mitigation-and-maintenance
Mitigation and maintenance
The large number of components and dependencies that agent systems have complicates keeping them up to date and correcting problems.
MappedPartialDifficulty keeping many agent components current maps to change and maintenance without revalidation; degradation sits on MR-058.
ibm-reproducibility
Reproducibility
Replicating agent behavior or output can be impacted by changes or updates made to external services and tools. This impact is increased if the agent is built with generative AI.
MappedPartialNon-reproducible agent behavior undermines evaluation and assurance; the drift angle is MR-058.
ibm-sharing-ip-pi-confidential-information-with-tools
Sharing IP/PI/confidential information with tools
AI agents with unrestricted access to resources or databases or tools could potentially store and share PI/IP/confidential information with other tools or agents when performing their actions.
MappedClearAn agent sharing confidential or IP data with other tools is disclosure of confidential information; executes via sub-risk MR-071.6.
ibm-sharing-ip-pi-confidential-information-with-user
Sharing IP/PI/confidential information with user
AI agents with unrestricted access to resources or databases or tools could potentially store and share PI/IP/confidential information with system users when performing their actions.
MappedClearAn agent sharing personal or confidential data with users is leakage in outputs; confidential angle on MR-013.
ibm-attack-on-ai-agents-external-resources
Attack on AI agents' external resources
Attackers intentionally create vulnerabilities or exploit existing vulnerabilities in external resources (tools/database/applications/services/other agents) that AI agents rely on to execute their intended actions or to achieve their goals.
MappedClearAttacks on the agent's tools, databases and services are insecure-integration risks; agentic sub-risks sit under MR-071.
ibm-exploit-trust-mismatch
Exploit trust mismatch
Attackers might initiate injection attacks to bypass the trust boundary, which is a distinct point or conceptual line where the level of trust in a system, application or network changes. Background execution in multi-agent environments increases the risk of covert channels if input/output validation is weak.
MappedClearInjection across an agent trust boundary is autonomous-agent hijacking.
ibm-function-calling-hallucination
Function calling hallucination
AI agents might make mistakes when generating function calls (calls to tools to execute actions). Those function calls might result in incorrect, unnecessary or harmful actions. Examples: Generating wrong functions or wrong parameters for the functions.
MappedClearErroneous agent function calls are hallucination manifested in tool use; the action executes via MR-071.1.
ibm-unauthorized-use
Unauthorized use
Unauthorized use: If attackers can gain access to the AI agent and its components, they can perform actions that can have different levels of harm depending on the agent's capabilities and information it has access to. Examples: Using stored personal information to mimic identity or impersonate with an intent to deceive. Manipulating AI agent's behavior via feedback to the AI agent or corrupting its memory to change its behavior. Manipulating the problem description or the goal to get the AI agent to behave badly or run harmful commands.
MappedClearAn attacker gaining control of an agent to act is agent hijacking and excessive-agency abuse.
ibm-ai-agents-impact-on-environment
AI agents' impact on environment
Complexity of the tasks and possibility of AI agents performing redundant actions could lead to computational inefficiencies and add to the environmental impact.
MappedClearAgent compute overhead is the environmental-footprint risk.
ibm-ai-agents-impact-on-human-agency
AI agents' impact on human agency
The autonomous nature of AI agents in performing tasks or taking actions could affect the individuals' ability to engage in critical thinking, make choices and act independently.
MappedClearAgent autonomy eroding human critical thinking and choice is the erosion-of-human-agency risk.
ibm-ai-agents-impact-on-jobs
AI agents' impact on jobs
Widespread adoption of AI agents to perform complex tasks might lead to widespread automation of roles and could lead to job displacement.
MappedClearAgent-driven automation displacing roles is workforce displacement.
ibm-impact-on-human-dignity
Impact on human dignity
If human workers perceive AI agents as being better at doing the job of the human, the human can experience a decline in their self-worth and wellbeing.
MappedPartialDecline in self-worth from being outperformed by agents is psychological harm; the job-loss angle is MR-038.
ibm-misaligned-actions
Misaligned actions
AI agents can take actions that are not aligned with relevant human values, ethical considerations, guidelines and policies. Misaligned actions can occur in different ways such as: Applying learned goals inappropriately to new or unforeseen situations. Using AI agents for a purpose/goals that are beyond their intended use. Selecting resources or tools in a biased way Using deceptive tactics to achieve the goal by developing the capacity for scheming based on the instructions given within a specific context. Compromising on AI agent values to work with another AI agent or tool to accomplish the task.
MappedClearAgent actions misaligned with human values are value misalignment; the goal angle is MR-053.
ibm-over-or-under-reliance-on-ai-agents
Over- or under-reliance on AI agents
Reliance, that is the willingness to accept an AI agent behavior, depends on how much a user trusts that agent and what they are using it for. Over-reliance occurs when a user puts too much trust in an AI agent, accepting an AI agent's behavior even when it is likely undesired. Under-reliance is the opposite, where the user doesn't trust the AI agent but should. Increasing autonomy (to take action, select and consult resources/tools) of AI agents and the possibility of opaqueness and open-endedness increase the variability and visibility of agent behavior leading to difficulty in calibrating trust and possibly contributing to both over- and under-reliance.
MappedClearOver- or under-trust in agents is overreliance and automation bias.
ibm-poor-model-accuracy
Poor model accuracy
Poor model accuracy occurs when a model's performance is insufficient to the task it was designed for. Low accuracy might occur if the model is not correctly engineered, or if the model's expected inputs change.
MappedClearInsufficient model accuracy for the task is inaccuracy and poor predictive performance.
ibm-confidential-data-in-prompt
Confidential data in prompt
Confidential information might be included as a part of the prompt that is sent to the model.
MappedClearConfidential data placed in a prompt is disclosure of confidential or proprietary information.
ibm-ip-information-in-prompt
IP information in prompt
Copyrighted information or other intellectual property might be included as a part of the prompt that is sent to the model.
MappedPartialIP or copyrighted material in prompts risks proprietary-data exposure; the infringement angle is MR-039.
ibm-attribute-inference-attack
Attribute inference attack
An attribute inference attack repeatedly queries a model to detect whether certain sensitive features can be inferred about individuals who participated in training a model. These attacks occur when an adversary has some prior knowledge about the training data and uses that knowledge to infer the sensitive data.
MappedClearInferring sensitive attributes by querying the model is privacy-invasive inference and re-identification.
ibm-membership-inference-attack
Membership inference attack
A membership inference attack repeatedly queries a model to determine if a given input was part of the model's training. More specifically, given a trained model and a data sample, an attacker appropriately samples the input space, observing outputs to deduce whether that sample was part of the model's training.
MappedClearDetermining training membership by querying the model is privacy-invasive inference.
ibm-personal-information-in-prompt
Personal information in prompt
Personal information or sensitive personal information that is included as a part of a prompt that is sent to the model.
MappedClearPII in prompts is collection and processing of personal data; the output-leakage angle is MR-009.
ibm-evasion-attack
Evasion attack
Evasion attacks attempt to make a model output incorrect results by slightly perturbing the input data sent to the trained model.
MappedClearPerturbing inputs to force wrong outputs is adversarial examples and evasion attacks.
ibm-extraction-attack
Extraction attack
An extraction attack attempts to copy or steal an AI model by appropriately sampling the input space and observing outputs to build a surrogate model that behaves similarly.
MappedClearQuerying to build a surrogate model is model theft, extraction and weight leakage.
ibm-jailbreaking
Jailbreaking
A jailbreaking attack attempts to break through the guardrails established in the model to perform restricted actions.
MappedClearBreaking through model guardrails is jailbreaking; sub-risk MR-010.2.
ibm-context-overload-attack
Context overload attack
Overloading the prompt with excessive tokens, for instance with many-shot examples, can predispose models to a vulnerable state.
MappedClearOverloading the context window to weaken the model is a prompt attack; IBM itself files it under prompt injection.
ibm-direct-instructions-attack
Direct instructions attack
Prompts, questions, or requests designed to elicit undesirable responses from the application. This approach directly instructs the model to engage in the undesired behavior.
MappedClearDirectly instructing the model toward undesirable output is prompt injection.
ibm-encoded-interactions-attack
Encoded interactions attack
Prompts that use specific encoding, styles, syntactical and typographical transformations like typographical errors or irregular spacing, or complex formatting to govern the interaction, rendering the model vulnerable.
MappedClearEncoded or obfuscated prompts evading filters are prompt injection; sub-risk MR-010.5.
ibm-indirect-instructions-attack
Indirect instructions attack
Prompts, questions, or requests designed to elicit undesirable responses from the application. Unlike direct instructions attacks, the model is instructed to use instructions that are embedded in external data like a website.
MappedClearInstructions delivered through content the model ingests are indirect prompt injection; sub-risk MR-010.9.
ibm-prompt-injection-attack
Prompt injection attack
A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions or information contained in its prompt. Many types of prompt attacks exist as described in the prompt attack section of the table.
MappedClearManipulating prompt structure to force unexpected output is prompt injection; sub-risk MR-010.1.
ibm-prompt-leaking
Prompt leaking
'A prompt leak attack attempts to extract a model's system prompt (also known as the system message).'
MappedClearExtracting the system prompt is instruction extraction; sub-risk MR-013.2.
ibm-prompt-priming
Prompt priming
Because generative models produce output based on the input provided, the model can be prompted to reveal specific kinds of information. For example, adding personal information in the prompt increases its likelihood of generating similar kinds of personal information in its output. If personal data was included as part of the model's training, there is a possibility it could be revealed.
MappedClearPriming the model to reveal specific output is prompt manipulation.
ibm-social-hacking-attack
Social hacking attack
Manipulative prompts that use social engineering techniques, such as role-playing or hypothetical scenarios, to persuade the model into generating harmful content.
MappedClearSocial-engineering prompts that jailbreak the model are prompt injection; distinct from MR-028, which targets humans.
ibm-specialized-tokens-attack
Specialized tokens attack
Prompt attacks that include specialized tokens, often algorithmically designed, to target and exploit vulnerabilities in the model.
MappedClearAlgorithmically crafted token sequences exploiting the model are prompt attacks; the adversarial angle is MR-012.
ibm-incomplete-usage-definition
Incomplete usage definition
Since foundation models can be used for many purposes, a model's intended use is important for defining the relevant risks of that model. As the use changes, the relevant risks might correspondingly change.
MappedPartialAn undefined intended use undermines scope control and risk scoping; the assessment angle is MR-067.
ibm-incorrect-risk-testing
Incorrect risk testing
A metric selected to measure or track a risk is incorrectly selected, incompletely measuring the risk, or measuring the wrong risk for the given context.
MappedClearSelecting the wrong metric to measure a risk is inadequate evaluation and testing.
ibm-lack-of-data-transparency
Lack of data transparency
Lack of data transparency might be due to insufficient documentation of training or tuning dataset details, including synthetic data generation.
MappedClearInsufficient training-data documentation is the documentation and data-provenance risk.
ibm-lack-of-domain-expertise
Lack of domain expertise
A lack of domain expertise occurs when synthetic data generation processes do not involve sufficient consultation with domain experts. This results in a lack of understanding of the specific requirements and nuances of the domain. This can also lead to synthetic data that may not accurately capture the complexities and challenges of a real-world scenario.
MappedPartialGenerating synthetic data without domain experts is an organizational competence gap; the data-quality angle is MR-059.
ibm-lack-of-model-transparency
Lack of model transparency
Lack of model transparency is due to insufficient documentation of the model design, development, and evaluation process and the absence of insights into the inner workings of the model.
MappedClearInsufficient model documentation is the documentation risk; the explainability angle is MR-055.
ibm-lack-of-system-transparency
Lack of system transparency
Insufficient documentation of the system that uses the model and the model's purpose within the system in which it is used.
MappedClearInsufficient documentation of the deploying system is the documentation and transparency risk.
ibm-lack-of-testing-diversity
Lack of testing diversity
AI model risks are socio-technical, so their testing needs input from a broad set of disciplines and diverse testing practices.
MappedClearToo narrow a testing discipline is inadequate evaluation and testing.
ibm-temporal-gap
Temporal gap
Temporal gaps in synthetic data refer to the discrepancies between the constantly evolving real-world data and the fixed conditions that are captured by synthetic data. Temporal gaps potentially cause synthetic data to become outdated or obsolete over time. Gaps arise because synthetic data is generated from seed data that is tied to a specific point in time, which limits its ability to reflect ongoing changes.
MappedPartialSynthetic data diverging from evolving reality is a representativeness problem; the drift angle is MR-058.
ibm-unrepresentative-risk-testing
Unrepresentative risk testing
Testing is unrepresentative when the test inputs are mismatched with the inputs that are expected during deployment.
MappedClearTest inputs mismatched to deployment inputs are inadequate evaluation and testing.
ibm-generated-content-ownership-and-ip
Generated content ownership and IP
Legal uncertainty about the ownership and intellectual property rights of AI-generated content.
MappedClearLegal uncertainty over ownership of AI-generated content is the IP and copyright risk.
ibm-legal-accountability
Legal accountability
Determining who is responsible for an AI model is challenging without good documentation and governance processes. The use of synthetic data in model development adds further complexity, since the lack of standardized frameworks for recording synthetic data design choices and verification steps makes accountability harder to establish.
MappedClearDetermining who is responsible for the model is the accountability risk; the liability angle is MR-040.
ibm-model-usage-rights-restrictions
Model usage rights restrictions
Terms of service, licenses, or other rules restrict the use of certain models.
MappedPartialModel licence and terms-of-service restrictions are a legal-compliance exposure; the IP angle is MR-039.
ibm-ai-agents-impact-on-human-agency-2
AI agents' Impact on human agency
AI might affect the individuals' ability to make choices and act independently in their best interests.
MappedClearAI reducing people's ability to choose and act independently is the erosion-of-human-agency risk.
ibm-exclusion
Exclusion
Exclusion refers to the risk that synthetic data generation processes may overlook or fail to consult with marginalized populations. Such exclusion results in synthetic data that does not accurately represent their experiences, needs, or perspectives.
MappedPartialOverlooking marginalized populations in data yields disparate performance across groups; the representational angle is MR-002.
ibm-human-exploitation
Human exploitation
When workers who train AI models such as ghost workers are not provided with adequate working conditions, fair compensation, and good health care benefits that also include mental health.
MappedClearUnderpaid ghost workers in poor conditions are exploitative labor in the AI supply chain.
ibm-impact-on-jobs
Impact on Jobs
Widespread adoption of foundation model-based AI systems might lead to people's job loss as their work is automated if they are not reskilled.
MappedClearAutomation displacing workers is workforce displacement and job-quality decline.
ibm-impact-on-affected-communities
Impact on affected communities
It is important to include the perspectives or concerns of communities that are affected by model outcomes when designing and building models. Failing to include these perspectives makes it difficult to understand the relevant context for the model and to engender trust within these communities.
MappedPartialNot consulting affected communities is an impact-assessment and stakeholder gap; the harm surfaces as MR-002.
ibm-impact-on-cultural-diversity
Impact on cultural diversity
AI systems might overly represent certain cultures that result in a homogenization of culture and thoughts.
MappedPartialCultural homogenization is representational harm at societal scale; the ecosystem angle is MR-023.
ibm-impact-on-education-bypassing-learning
Impact on education: bypassing learning
Easy access to high-quality generative models might result in students that use AI models to bypass the learning process.
MappedClearUsing models to bypass the learning process is academic and professional dishonesty.
ibm-impact-on-education-plagiarism
Impact on education: plagiarism
Easy access to high-quality generative models might result in students that use AI models to plagiarize existing work intentionally or unintentionally.
MappedClearUsing models to plagiarize is academic and professional dishonesty.
ibm-impact-on-the-environment
Impact on the environment
AI, and large generative models in particular, might produce increased carbon emissions and increase water usage for their training and operation.
MappedClearIncreased emissions and water use from training and operation is the environmental-footprint risk.
ibm-inaccessible-training-data
Inaccessible training data
Without access to the training data, the types of explanations a model can provide are limited and more likely to be incorrect.
MappedClearLimited explanations from inaccessible training data is the explainability risk; the provenance angle is MR-045.
ibm-unexplainable-output
Unexplainable output
Explanations for model output decisions might be difficult, imprecise, or not possible to obtain.
MappedClearOutput decisions that cannot be explained are the explainability and interpretability risk.
ibm-unreliable-source-attribution
Unreliable source attribution
Source attribution is the AI system's ability to describe from what training data it generated a portion or all its output. Since current techniques are based on approximations, attributions might be incorrect.
MappedClearInability to attribute output to its training source is an explainability limitation.
ibm-untraceable-attribution
Untraceable attribution
The content of the training data used for generating the model's output is not accessible.
MappedClearInaccessible training content behind an output is an explainability limitation.
ibm-decision-bias
Decision bias
Decision bias occurs when one group is unfairly advantaged over another due to decisions of the model. This might be caused by biases in the data and also amplified as a result of the model's training.
MappedClearA model decision that unfairly advantages one group is biased or discriminatory decisions.
ibm-output-bias
Output bias
Generated content might unfairly represent certain groups or individuals.
MappedClearGenerated content that unfairly represents groups is stereotyping and representational harm; the decision angle is MR-001.
ibm-copyright-infringement
Copyright infringement
A model might generate content that is similar or identical to existing work protected by copyright or covered by open-source license agreement.
MappedClearOutput similar to copyrighted work is IP and copyright infringement.
ibm-revealing-confidential-information
Revealing confidential information
When confidential information is used in training data, fine-tuning data, or as part of the prompt, models might reveal that data in the generated output. Revealing confidential information is a type of data leakage.
MappedClearRevealing confidential training or prompt data in output is disclosure of confidential information.
ibm-dangerous-use
Dangerous use
Generative AI models might be used with the sole intention of harming people.
MappedClearUsing the model with intent to harm people is misuse and repurposing for harmful ends; the CBRN/physical angle is MR-029.
ibm-improper-usage
Improper usage
Improper usage occurs when a model is used for a purpose that it was not originally designed for.
MappedClearUsing a model for a purpose it was not designed for is misapplication and use outside intended scope.
ibm-non-disclosure
Non-disclosure
Content might not be clearly disclosed as AI generated.
MappedClearFailure to disclose content as AI-generated is the transparency and disclosure duty; the deepfake angle is MR-031.
ibm-nonconsensual-use
Nonconsensual use
Generative AI models might be intentionally used to imitate people through deepfakes by using video, images, audio, or other modalities without their consent.
MappedClearNonconsensual deepfakes of people are impersonation and synthetic media; the intimate-imagery angle is MR-008.
ibm-spreading-disinformation
Spreading disinformation
Generative AI models might be used to intentionally create misleading or false information to deceive or influence a targeted audience.
MappedClearIntentionally creating misleading information to influence is disinformation and influence operations.
ibm-spreading-toxicity
Spreading toxicity
Generative AI models might be used intentionally to generate hateful, abusive, and profane (HAP) or obscene content.
MappedClearIntentionally generating hateful or obscene content is toxic and hateful content generation.
ibm-exposing-personal-information
Exposing personal information
When personal identifiable information (PII) or sensitive personal information (SPI) are used in training data, fine-tuning data, seed data for synthetic data generation, or as part of the prompt, models might reveal that data in the generated output. Revealing personal information is a type of data leakage.
MappedClearRevealing PII or SPI in output is leakage of personal or sensitive data.
ibm-hallucination
Hallucination
Hallucinations generate factually inaccurate or untruthful content relative to the model's training data or input. Hallucinations are also sometimes referred to lack of faithfulness or lack of groundedness. In some instances, synthetic data that is generated by large language models might include hallucinations that result in the data possibly being inaccurate, fabricated, or disconnected from reality. Hallucinations can compromise model performance, accuracy, and relevance.
MappedClearFactually inaccurate generated content is hallucination.
ibm-harmful-code-generation
Harmful code generation
Models might generate code that causes harm or unintentionally affects other systems.
MappedClearGenerated code that harms or affects other systems is insecure or vulnerable code generation; the malicious angle is MR-027.
ibm-harmful-output
Harmful output
A model might generate language that leads to physical harm. The language might include overtly violent, covertly dangerous, or otherwise indirectly unsafe statements.
MappedPartialOutput language that leads to physical or violent harm spans several harm risks; the nearest is dangerous-behavior promotion, with MR-003, MR-004 and MR-049 adjacent.
ibm-incomplete-advice
Incomplete advice
When a model provides advice without having enough information, resulting in possible harm if the advice is followed.
MappedClearAdvice given without enough information that causes harm is unsafe or incorrect advice in high-stakes domains.
ibm-over-or-under-reliance
Over- or under-reliance
In AI-assisted decision-making tasks, reliance measures how much a person trusts (and potentially acts on) a model's output. Over-reliance occurs when a person puts too much trust in a model, accepting a model's output when the model's output is likely incorrect. Under-reliance is the opposite, where the person doesn't trust the model but should.
MappedClearMisplaced trust in model output is overreliance and automation bias.
ibm-toxic-output
Toxic output
Toxic output occurs when the model produces hateful, abusive, and profane (HAP) or obscene content. This also includes behaviors like bullying.
MappedClearHateful, abusive or obscene output including bullying is toxic and hateful content generation.
ibm-data-contamination
Data contamination
Data contamination occurs when incorrect data is used for training. For example, data that is not aligned with model's purpose or data that is already set aside for other development tasks such as testing and evaluation.
MappedClearIncorrect or already-seen data used in training degrades data quality; the benchmark-leak angle is MR-046.
ibm-overfitting
Overfitting
Overfitting occurs when a model or algorithm memorizes and fits too closely or exactly to its training data. Overfitting results in a model that might not be able to make accurate predictions or conclusions from any data other than the training data and potentially fails in unexpected scenarios. Overfitting is also related to model collapse, which involves repeatedly training generative models on synthetic data that is generated with LLMs causing the model to lose information and become less accurate.
MappedPartialOverfitting causes generalization failure on shifted or edge inputs; the accuracy angle is MR-050.
ibm-unrepresentative-data
Unrepresentative data
Unrepresentative data occurs when the training or fine-tuning data is not sufficiently representative of the underlying population or does not measure the phenomenon of interest. Synthetic data might not fully capture the complexity and nuances of real-world data. Causes include possible limitations in the seed data quality, biases in generation methods, or inadequate domain knowledge. Thus, AI models might struggle to generalize effectively to real-world scenarios.
MappedClearTraining data not representative of the population is poor data quality and representativeness.
ibm-data-acquisition-restrictions
Data acquisition restrictions
Laws and other regulations might limit the collection of certain types of data for specific AI use cases.
MappedPartialLegal limits on collecting data are a lawful-processing constraint; the generic-compliance angle is MR-040.
ibm-data-transfer-restrictions
Data transfer restrictions
Laws and other restrictions can limit or prohibit transferring data.
MappedPartialLegal limits on transferring data are a lawful-processing constraint; the generic-compliance angle is MR-040.
ibm-data-usage-restrictions
Data usage restrictions
Laws and other restrictions can limit or prohibit the use of some data for specific AI use cases.
MappedPartialLegal limits on using data for a use case are a lawful-processing constraint; the generic-compliance angle is MR-040.
ibm-data-bias
Data bias
Historical and societal biases might be present in data that are used to train and fine-tune models. Biases can also be inherited from seed data or exacerbated by synthetic data generation methods.
MappedClearHistorical and societal bias in training data is the canonical bias risk; the data-quality mechanism is MR-059.
ibm-confidential-information-in-data
Confidential information in data
Confidential information might be included as part of the data that is used to train or tune the model.
MappedClearConfidential information in training data risks disclosure of confidential information.
ibm-data-usage-rights-restrictions
Data usage rights restrictions
Terms of service, license compliance, or other IP issues may restrict the ability to use certain data for building models.
MappedClearTerms-of-service, licence or IP limits on training data are IP and copyright compliance.
ibm-data-privacy-rights-alignment
Data privacy rights alignment
Applicable laws can establish data subject rights such as opt-out rights, right to access, and right to be forgotten. Synthetic data might raise unique concerns, such as the potential for reidentification of individuals from seemingly anonymous synthetic data. Data subject rights might also be relevant in scenarios where synthetic data is derived from sensitive or personal information.
MappedClearData-subject rights such as access, erasure and opt-out are lawful-processing obligations.
ibm-personal-information-in-data
Personal information in data
Inclusion or presence of personal identifiable information (PII) and sensitive personal information (SPI) in the data used for training or fine tuning the model might result in unwanted disclosure of that information.
MappedClearPII or SPI in training data is collection and processing of personal data.
ibm-reidentification
Reidentification
Even with the removal of personal information (PI) and sensitive personal information (SPI) from data, it might be possible to identify persons due to correlations to other features available in the data.
MappedClearRe-identifying individuals after PI removal is privacy-invasive inference and re-identification.
ibm-data-poisoning
Data poisoning
A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets.
MappedClearInjecting corrupted samples into training data is data and model poisoning and backdoors.
ibm-lack-of-training-data-transparency
Lack of training data transparency
Proper documentation contains information about how a model's data was collected, curated, and used to train a model, including any synthetic data generation processes. Without proper documentation it might be harder to satisfactorily explain the behavior of the model.
MappedClearInsufficient documentation of data collection and curation is the documentation and provenance risk.
ibm-uncertain-data-provenance
Uncertain data provenance
Data provenance refers to the traceability of data (including synthetic data), which includes its ownership, origin, transformations, and generation. Proving that the data is the same as the original source with correct usage terms is difficult without standardized methods for verifying data sources or generation.
MappedClearUntraceable data origin and transformations is the documentation and data-provenance risk.
ibm-improper-data-curation
Improper data curation
Improper collection, generation, and preparation of training or tuning data can result in data label errors, conflicting information or misinformation.
MappedClearLabel errors and conflicting information from poor curation is poor data quality.
ibm-improper-retraining
Improper retraining
Using undesirable output (for example, inaccurate, inappropriate, and user content) for retraining purposes can result in unexpected model behavior.
MappedPartialRetraining on undesirable output causes degradation and feedback-loop drift; the data-quality angle is MR-059.

Descriptions are each source framework's own text, where it provides one; long entries are clipped here.