Transparency Standards: Model Cards, Datasheets, and System Cards

FlowRidge

Definition

Transparency standards are structured documentation artifacts that disclose what an artificial intelligence (AI) system is, what data it was trained on, what it was tested for, where it performs well, and where it performs poorly. The three dominant formats — model cards, datasheets for datasets, and system cards — were developed in the academic literature between 2018 and 2022 and have since become industry baseline expectations and increasingly regulatory requirements. This article explains each format, the relationships among them, and the operational practices that turn documentation from a compliance afterthought into the load-bearing artifact that supports oversight, audit, and accountable use.

Why Standardized Documentation Matters

Before the model card paper of Mitchell et al. (FAT* 2019), AI documentation was idiosyncratic. Vendors disclosed whatever they chose; buyers asked for whatever they thought of asking; regulators had no common reference for what a complete description of a model should contain. This produced a market in which adverse selection thrived: opaque vendors competed with transparent ones on price, with no buyer-side mechanism for distinguishing between them.

Standardized documentation addresses three failures simultaneously. It gives buyers a uniform comparison basis across vendors. It gives operators a complete enough description to use the system within its intended scope and detect drift outside it. It gives regulators an inspection target, so audits can focus on whether documentation is accurate rather than on whether documentation exists.

The OECD AI Principles list transparency as one of the five values-based principles; see https://oecd.ai/en/ai-principles. The EU HLEG Ethics Guidelines for Trustworthy AI specify documentation as a sub-requirement of transparency; see https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai. The EU AI Act, in Articles 11–13, makes documentation mandatory for high-risk systems and specifies its content in detail.

Model Cards

A model card is a one-to-five-page structured document that describes an individual machine learning model. The original specification by Mitchell et al. proposed nine sections, which have become the industry baseline:

Model details — name, version, date, type, architecture, owner.
Intended use — primary intended uses, primary intended users, out-of-scope uses.
Factors — relevant demographic, environmental, and instrumentation factors that may affect performance.
Metrics — performance measures, decision thresholds, variation approaches.
Evaluation data — datasets used for evaluation, motivation, preprocessing.
Training data — same disclosures as evaluation data, when public release is possible.
Quantitative analyses — disaggregated performance across factors (the bias and fairness analysis from Article 3).
Ethical considerations — sensitive use cases, mitigation strategies, risks.
Caveats and recommendations — known issues, recommended mitigations, suggested uses.

The model card is not a marketing document. Sections 3 (factors), 7 (disaggregated performance), and 8 (ethical considerations) require disclosure of the model’s weaknesses, not just its strengths. A well-written model card tells operators what not to use the model for as clearly as it tells them what to use it for.

Hugging Face has implemented model cards as a first-class object in its model hub since 2020, and the format has become the de facto industry standard for open-source models. Google introduced model cards to its public-facing AI products (Translation, Object Detection) in 2019. Microsoft, Meta, OpenAI, and most major commercial vendors now publish model cards or equivalents for their flagship systems.

Datasheets for Datasets

A datasheet for a dataset, introduced by Gebru et al. (Communications of the ACM, 2021), is the analog of a model card for the data on which a model was trained. The format addresses a long-standing problem: that the same data can produce very different models depending on how it was collected, labeled, and processed, but those provenance details are usually invisible to model users.

The seven recommended sections of a datasheet are:

Motivation — why the dataset was created, by whom, and for whom.
Composition — what the dataset contains, including demographic distributions, missing data patterns, and known anomalies.
Collection process — how the data was acquired, what mechanisms were used (sensors, surveys, web scraping), and what consent or permission was obtained.
Preprocessing/cleaning/labeling — what transformations were applied between raw data and the released dataset.
Uses — what the dataset has been used for and what it should not be used for.
Distribution — how the dataset is distributed, under what license, with what restrictions.
Maintenance — who maintains the dataset, how errors are reported, and how updates are released.

Datasheets surface considerations that model cards alone cannot. A model card may report that a model was trained on the “ImageNet” dataset; a datasheet for ImageNet would disclose the labor practices used to collect labels, the decisions about which categories to include and exclude, and the geographic and demographic distribution of contributors and depicted subjects. Several recent academic projects have audited widely used datasets retrospectively and found significant issues that would have been disclosed had datasheets been required at the time.

System Cards

A system card describes a deployed AI system as a whole — typically a product or feature that integrates one or more models with surrounding logic, user interfaces, and operational context. The format was popularized by Meta’s system card releases in 2022 and 2023.

A system card includes information that model cards and datasheets cannot capture because it lives at the system level: how the model’s outputs are combined with other inputs, what user-facing controls exist, what content moderation or safety filters are applied, what the system is and is not allowed to do, and what telemetry is collected about its behavior.

System cards are particularly important for generative AI products, where the same underlying model may produce vastly different user experiences depending on the surrounding system design. A large language model wrapped in a customer service chatbot, a coding assistant, and a creative writing tool requires three different system cards even if the underlying model card is identical.

The Relationship Among the Three Formats

The three formats are nested. A datasheet describes a dataset. A model card describes a model and references the datasets used to train and evaluate it (each of which has its own datasheet). A system card describes a deployed system and references the models embedded in it (each of which has its own model card).

A complete documentation package for a deployed AI product therefore typically includes one system card, references to one or more model cards, and references through those model cards to the underlying dataset datasheets. When done well, this nested structure allows a regulator, auditor, or sophisticated buyer to drill from product behavior down to underlying training data without losing context.

Operational Practice

The most common failure mode is that documentation is treated as a final-deliverable produced by the development team after the model is finished. Three operational disciplines avoid this.

Document as you build. Draft a model card at project intake, fill in each section as the corresponding work is completed, and hold the model card review at the same gate as the model itself. Documentation written months after the work is unreliable; documentation written alongside the work is part of the work.

Tie documentation to release. A model that does not have a current model card cannot be released to production. A dataset that does not have a current datasheet cannot be admitted to the training pipeline. A system that does not have a current system card cannot be exposed to external users. These gates require executive backing because they will sometimes block releases.

Refresh on change. Documentation that is accurate at launch becomes inaccurate as the system evolves. A documentation refresh should be triggered by any model retraining, dataset update, or material system reconfiguration. The refresh cadence should be defined explicitly — typically every quarterly retraining cycle for active systems and at every release for shipping software.

The Partnership on AI’s About ML project provides templates and procedural guidance for operationalizing these practices; see https://partnershiponai.org/. The IEEE 7001 standard on transparency provides a more formal specification that procurement teams can incorporate into contracts; see https://standards.ieee.org/ieee/7000/6781/ for the IEEE 7000 family overview.

Regulatory and Procurement Pressure

Standardized documentation is rapidly transitioning from voluntary best practice to regulatory expectation. The EU AI Act requires technical documentation for high-risk systems that overlaps substantially with model card and datasheet content. The Singapore IMDA Model AI Governance Framework recommends model cards or equivalents for all consequential systems; see https://www.pdpc.gov.sg/help-and-resources/2020/01/model-ai-governance-framework. The NIST AI Risk Management Framework treats documentation as a measurable function and provides specific guidance in its companion playbook; see https://www.nist.gov/itl/ai-risk-management-framework.

On the buyer side, the Algorithmic Accountability Act introduced in the US Congress (H.R. 5628) would require impact assessments that effectively codify many model card and datasheet disclosures for federal procurement; see https://www.congress.gov/bill/118th-congress/house-bill/5628. Major federal procurement organizations and several state governments have already begun including documentation requirements in their AI tender documents.

Maturity Indicators

Level 1: No standardized documentation; what exists is ad-hoc and incomplete.
Level 2: A documentation template exists but is inconsistently applied.
Level 3: Model cards (and datasheets for proprietary datasets) are mandatory for high-risk systems, with defined sections and a release gate.
Level 4: Documentation is created in parallel with development, refreshed on every material change, and stored in a queryable corporate registry. System cards exist for all customer-facing AI products.
Level 5: Documentation is published externally; the organization contributes to industry documentation standards; documentation completeness and quality are tracked as product KPIs.

Practical Application

Three first steps. First, adopt the Mitchell et al. model card format as the corporate standard, with no modifications, so that internal documentation is comparable to external publications and to vendor disclosures the organization receives. Second, require a draft model card at the use-case approval gate (Article 14) and a complete model card at the pre-deployment review gate. Third, build a corporate model card registry — even as a simple wiki or shared document repository — so that operators and auditors can find documentation without asking the development team.

Looking Ahead

Article 7 turns from documentation to deliberation: the design and operation of AI ethics review boards. Documentation is the input; structured ethics review is the process that turns it into a defensible decision.