AI Bill of Materials — MBOM and Model Lineage

FlowRidge

Definition

An Artificial Intelligence Bill of Materials (AI-BOM) is a structured, machine-readable record of every component participating in an AI system: foundation models, fine-tunes, datasets, embeddings, prompts, hosting infrastructure, and supporting libraries — together with their versions, origins, licenses, and known attestations. The Model Bill of Materials (MBOM) is the model-specific subset of the AI-BOM, focused on the model artefacts and their lineage. Together they bring to AI the artefact-tracking discipline that the Software Bill of Materials (SBOM) movement established for conventional software, and they are the operational data structure on which every other supply-chain control in this module depends.

This article defines the AI-BOM and MBOM, names the standards that are converging on canonical formats, explains how lineage information is captured and verified, and connects the AI-BOM to the broader regulatory expectation of supply-chain visibility.

Why the SBOM Movement Now Reaches AI

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) Software Bill of Materials programme at https://www.cisa.gov/sbom established the SBOM as the canonical inventory of software components in a deployed system. Following Executive Order 14028, federal-government supplier expectations have driven SBOM adoption broadly across the U.S. software industry, and the practice is now international. The Software Package Data Exchange (SPDX) standard at https://spdx.dev/ provides one of the two dominant SBOM file formats (alongside CycloneDX) and supplies the canonical machine-readable vocabulary for declaring component origins and licenses.

AI systems extend the SBOM challenge in three directions. First, AI introduces new artefact types — model weights, training datasets, fine-tuning datasets, embeddings, prompt templates — that conventional SBOMs do not enumerate. Second, AI components are produced by deeper supply chains than typical libraries, with foundation-model providers, data brokers, and fine-tuners all contributing. Third, AI components mutate after deployment in ways libraries do not — model versions update silently, embeddings are recomputed, prompts are tuned. The AI-BOM and MBOM constructs are the response.

Both SPDX and CycloneDX have published extensions to cover model and dataset components. CISA convenes working groups on AI-BOM specifically. The European Union (EU) AI Act, accessible at https://artificialintelligenceact.eu/, requires technical documentation for high-risk systems under Annex IV that materially overlaps with AI-BOM content. The International Organization for Standardization / International Electrotechnical Commission (ISO/IEC) 42001:2023 standard at https://www.iso.org/standard/81230.html assumes that AI components are inventoried, which the AI-BOM provides.

What an AI-BOM Records

A defensible AI-BOM captures, at minimum, the following classes of information for every component.

Component Identity

A unique identifier (a Package URL, Common Platform Enumeration, or equivalent), the component name, version, and producer. For models, this includes the foundation model family (where derived) and the specific version or fine-tune identifier.

Origin and Provenance

Where the component was obtained, the cryptographic hash of the artefact, the build attestation (where available), and the chain-of-custody evidence linking the deployed artefact to its declared origin. Supply-chain Levels for Software Artifacts (SLSA) at https://slsa.dev/ defines the four levels of build-pipeline integrity that anchor provenance claims for software and increasingly for model artefacts.

License and Acceptable-Use Terms

The SPDX license identifier (or a custom-license placeholder for non-standard model licenses), the acceptable-use policy reference, and any commercial-use, downstream-distribution, or derivative-work restrictions. The AI-BOM is the canonical place where license obligations are recorded for downstream enforcement.

Training and Fine-Tuning Data

For models, the datasets used for pre-training and fine-tuning, with provenance and license assertions. Where the upstream provider does not disclose training data (the typical case for major proprietary foundation models), the AI-BOM records “undisclosed by provider” rather than leaving the field blank — the absence is itself documented.

Behavioural Characteristics

References to the model card, evaluation results, and any safety or bias assessments. The Stanford Foundation Model Transparency Index at https://crfm.stanford.edu/fmti/ defines the categories of disclosure that downstream users increasingly expect upstream providers to publish; AI-BOM entries can reference the published transparency disclosures.

Hosting and Sub-Processor Topology

Where the component runs (cloud region, dedicated tenant, on-premises), which sub-processors are involved, and which jurisdictions data flows through. This data underpins the cross-border-transfer governance addressed in Article 12 of this module.

Cryptographic Verification Artefacts

Hashes, signatures, and attestations that allow runtime verification that the deployed artefact matches what was approved. The Hugging Face Safetensors format documented at https://huggingface.co/docs/safetensors illustrates the cryptographic-verification surface that an AI-BOM can reference.

How an MBOM Differs

The Model Bill of Materials (MBOM) zooms in on the model. Where the AI-BOM enumerates everything in the system, the MBOM details the model artefact specifically: foundation model lineage, fine-tuning steps, evaluation results, version history, and the prompts or system messages that materially shape behaviour. For systems where the model is the primary risk surface, the MBOM is the document that risk reviewers, auditors, and incident responders consult.

The MBOM is also where Reinforcement Learning from Human Feedback (RLHF) and similar post-training adaptation steps are recorded. Without explicit MBOM capture, two fine-tunes of the same foundation model are indistinguishable to downstream consumers — a serious supply-chain visibility gap.

Where the AI-BOM Is Generated

In mature programs, the AI-BOM is generated automatically from build, deployment, and runtime telemetry. Manually maintained AI-BOMs are stale by the time they are reviewed and are easily falsified. Automation requires three capabilities: a model registry that records every model artefact entering production, a component-discovery process that identifies AI dependencies in application code, and a policy engine that enforces required AI-BOM completeness before deployment.

Cloud providers and Machine Learning Operations (MLOps) platforms increasingly emit AI-BOM data natively. The Cloud Security Alliance at https://cloudsecurityalliance.org/ has published guidance on integrating these emissions into enterprise asset-management systems. The U.S. National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) at https://www.nist.gov/itl/ai-risk-management-framework GOVERN-6 control assumes inventory completeness, which automated AI-BOM generation provides.

How the AI-BOM Connects to Risk

The AI-BOM is operational input to almost every other supply-chain control. Vendor risk reassessment (Article 3) consumes it. Continuous monitoring (Article 10) keys off it. Incident response (Article 14) starts with it. Tiered programs (Article 15) use it to classify systems. The cybersecurity supply-chain risk-management practice defined in NIST Special Publication (SP) 800-161 Revision 1 at https://csrc.nist.gov/pubs/sp/800/161/r1/final assumes a current, accurate inventory of components — for AI systems, the AI-BOM is that inventory. An organization that cannot produce an AI-BOM on demand cannot operate any of the controls described in this module reliably.

Maturity Indicators

Maturity	What AI-BOM and MBOM look like
Foundational (1)	No AI-BOM exists; teams cannot enumerate the models in production.
Developing (2)	Manually maintained AI-BOM exists for selected systems; coverage is partial; staleness is acknowledged.
Defined (3)	AI-BOM is mandatory for every system above the standard tier; MBOM accompanies every model in the registry; SPDX or equivalent format is used.
Advanced (4)	AI-BOM is generated automatically from build pipelines and updated continuously; cryptographic provenance verification runs at load time.
Transformational (5)	The organization contributes to AI-BOM standards (SPDX, CycloneDX, CISA AI-BOM); supplier AI-BOMs flow directly into the enterprise AI-BOM through automated exchange.

Practical Application

An asset manager preparing to obtain ISO/IEC 42001 certification should treat the AI-BOM as the foundational artefact of the management-system implementation. A pilot AI-BOM should be produced for the three highest-risk AI systems first, populated with all components, licenses, and provenance available, and supplemented by explicit “undisclosed by provider” entries where upstream providers do not reveal training data or build attestations. The pilot informs tooling decisions and template refinement; subsequent rollout extends coverage to every system the management system claims. Without an AI-BOM, the management system has nothing concrete to manage.

The next article (Article 7) drills into the most consequential and most under-documented AI-BOM entry of all: the training data lineage that determines what a model knows, who can claim to own its outputs, and what regulatory exposure follows.