AI Glossary: Building Shared Vocabulary in Your Org

FlowRidge

This article describes the role a glossary plays in AI program coherence, the structure of a useful glossary entry, the governance that keeps the glossary trustworthy, and the techniques for driving adoption beyond the small group that wrote it.

Why Vocabulary Drift Is Expensive

Three concrete costs make the case for investment.

First, decision incoherence. When “high-risk model” means different things to the security team, the legal team, and the product team, gating decisions diverge. A model the security team treats as low-risk may be classified high-risk by legal, with no procedural way to reconcile the difference. Vocabulary alignment is a precondition for decision alignment.

Second, regulatory exposure. The European Union AI Act in Article 3 defines specific terms with specific meanings — “AI system,” “general-purpose AI model,” “high-risk AI system,” “deployer,” “provider.” Available at https://artificialintelligenceact.eu/article/3/, these definitions trigger specific obligations. An organisation whose internal vocabulary does not align with the regulatory vocabulary will misclassify its own systems and miss obligations.

Third, onboarding cost. A new hire to an AI program spends weeks learning the local meaning of words they thought they already understood. A published glossary cuts the onboarding curve dramatically and makes the organisation’s culture more accessible.

The U.S. National Institute of Standards and Technology has invested heavily in shared vocabulary through the AI RMF Glossary at https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary, recognising that a common language is necessary infrastructure for risk management.

The Anatomy of a Good Glossary Entry

A useful entry has more structure than a dictionary definition.

Term

The term itself, including any common abbreviation. For multi-word terms, the canonical form should be specified (capitalisation, hyphenation, plural).

One-Sentence Definition

A definition that fits in a sentence and reads naturally in a sentence. The definition should be operational — it should tell the reader what makes something an instance of the term and what excludes other things from being instances.

Extended Explanation

One to three paragraphs that elaborate, give examples, and clarify common confusions. The extended explanation is where the glossary earns its keep; the one-sentence definition by itself is rarely sufficient for non-trivial terms.

Authoritative Source

A citation to the external standard, regulation, or framework that informs the definition. For terms drawn from the EU AI Act, the article number. For terms drawn from ISO/IEC 22989:2022 (AI Concepts and Terminology) at https://www.iso.org/standard/74296.html, the clause number. For terms drawn from NIST AI RMF, the section. The source enables the reader to drill into the original context if needed.

Synonyms and Disambiguations

Other words that are sometimes used for the same concept (and the program’s preference among them) and other concepts that use similar words but mean different things. Synonyms reduce search friction; disambiguations prevent silent miscommunication.

Cross-links to related glossary entries that the reader is likely to need next. The cross-links create the graph that makes the glossary navigable.

Examples

Two or three concrete examples drawn from the organisation’s own context. Examples are what make abstract definitions sticky.

Audience Notes

For terms whose interpretation varies by audience (technical, legal, business), brief notes that translate. A “model deployment” means something specific to a Machine Learning (ML) engineer, something different to a product owner, and something different again to a regulator.

Governance

A glossary becomes worse, not better, without governance. Three governance practices keep it useful.

Editorial board. A small group — typically a senior member from data science, engineering, legal, and risk — owns the glossary. New entries are proposed through a defined process and reviewed by the board. The board also resolves contested definitions, which is its highest-value function.

Versioning and deprecation. Definitions change as the program matures. Each entry should record its last review date and the reviewer’s name. Definitions that change should retain history; definitions that become obsolete should be marked deprecated rather than deleted. The deprecation note should point to the replacement.

Sourcing discipline. Where an entry diverges from an external standard, the divergence should be explicit and justified. Local terminology that contradicts ISO or regulatory terminology without explanation is a future audit finding.

Translation alignment. For multilingual organisations, translated glossaries should be derived from the canonical glossary, not authored independently. Translation discrepancies are a source of cross-border decision drift.

Adoption Techniques

A glossary that exists but is not used is wasted work. Adoption techniques are what turn it into shared infrastructure.

Discoverability. The glossary should be findable in seconds from the AI program landing page, the data catalogue, and the model registry. It should be searchable from the wiki, from the chat platform (slash command), and from the IDE for engineers.

Embedding. Other AI program documents should link directly to glossary entries. A model card that references “high-risk system” should hyperlink to the glossary definition. A policy that references “deployer” should link to the entry. Embedding makes the glossary load-bearing for the rest of the documentation.

Onboarding. New hires should be introduced to the glossary in the first week. The introduction should include both the structure (where to find what) and the philosophy (the glossary is authoritative; if the term is in the glossary, use the glossary’s definition).

Recurring exposure. Quarterly newsletters, internal newsletters, and learning campaigns can highlight new and updated entries. The repetition reinforces the habit of consulting the glossary.

Live consultation. The editorial board should be reachable through a defined channel for fast clarification questions. The questions themselves are valuable — they identify gaps and ambiguities that drive the next round of edits.

Specific Term Categories That Pay Back

Certain term categories return investment quickly.

Risk classifications: high-risk, unacceptable-risk, limited-risk, minimal-risk. The categories drive procedural workflows; without shared definitions the workflows themselves are unstable.

Lifecycle stages: in development, in evaluation, in pilot, in production, in decommissioning. Each stage typically has different governance requirements; ambiguity at the boundary causes systems to slip through gates.

Roles: provider, deployer, distributor, user, affected person, data subject, operator. Many AI regulations distribute obligations by role; ambiguity creates either compliance gaps or unnecessary process.

Data sensitivities: personal data, sensitive personal data, special category data, anonymised data, pseudonymised data. The General Data Protection Regulation and similar laws use these terms with precision; internal vocabulary should mirror the precision.

Generative AI specifics: prompt, system prompt, retrieval, grounding, hallucination, tool use, agent, autonomy. Generative AI introduced an entire vocabulary that the broader organisation may not yet share. Capturing it early prevents later confusion.

Common Failure Modes

The first is over-collection — the glossary includes thousands of entries copied from external standards, most of which the organisation does not actually use. Counter by curation: only include terms the organisation actually uses, marked with the source.

The second is aspirational definitions — definitions that describe the meaning the editorial board wishes the term had, not the meaning the organisation actually uses. Counter by sampling actual usage and aligning the definition with practice (or by changing practice to match the definition, with explicit communication).

The third is neglected maintenance — the glossary becomes stale and people stop trusting it. Counter by mandatory annual review of every entry, with the reviewer’s name on the line.

The fourth is isolation from regulation — the glossary diverges silently from regulatory definitions. Counter by mandatory cross-reference to the source standard, with explicit divergence notes when local meaning differs.

Looking Forward

Module 1.23 closes here. The next module (M1.24) turns to AI resilience — the practices and infrastructure that keep AI systems running through failure, with the documentation discipline of this module providing the foundation that resilience operations rest on.