Knowledge Management for AI Programs

FlowRidge

This article describes the knowledge taxonomy that an AI program needs, the system architecture that makes the taxonomy operational, and the cultural practices that determine whether knowledge management becomes a treasured shared resource or a graveyard of half-written wiki pages.

Why AI Programs Have Distinctive Knowledge Needs

Three factors make knowledge management harder for AI than for general engineering work.

First, velocity of model and tool change. Foundation-model releases, framework updates, and tooling evolution operate on a faster cycle than most enterprise functions. A decision documented eighteen months ago may have been correct then and wrong now. The knowledge base must support time-aware retrieval and explicit deprecation.

Second, multi-disciplinary contributors. AI program knowledge spans data engineering, machine learning research, software engineering, legal, ethics, security, and business domain expertise. Each discipline writes differently. The knowledge base must accommodate the format, vocabulary, and review process appropriate to each, while still allowing cross-discipline search.

Third, legal and audit weight. Some AI program documents are evidence in regulatory inquiries; others are internal speculation that should never be cited externally. The knowledge base must support classification and access controls that preserve this distinction. The U.S. National Archives and Records Administration guidance on Federal Agency Records Management at https://www.archives.gov/records-mgmt/policy provides a reference framework that translates well to AI program records.

The Knowledge Taxonomy

A useful taxonomy distinguishes types by their authoring effort, lifecycle, and audience.

Persistent Reference Documents

The longest-lived items: glossaries, principles, policies, framework documentation. These are authored slowly, reviewed broadly, and updated sparingly. Examples in an AI program: the AI ethics policy, the COMPEL methodology overview, the AI glossary covered in Module 1.26, the controlled vocabulary for model classifications.

Living Operational Documents

Documents that describe how the program runs and change as the program evolves: runbooks, RACI matrices, on-call schedules, escalation paths, incident response playbooks. These need clear ownership, version history, and update triggers. The U.S. Site Reliability Engineering literature, particularly the Google SRE Workbook at https://sre.google/sre-book/table-of-contents/, articulates the operational documentation patterns that translate to AI operations.

Per-Artefact Documentation

Model cards, datasheets, system cards, and the per-artefact versions of risk assessments and ethics reviews. These follow a defined structure (Modules 1.23 first two articles), live alongside the artefact they describe, and version with it.

Decision Records

Architectural Decision Records (ADRs) capture the context, options, decision, and consequences of significant choices. The format originated with Michael Nygard at https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions and has spread to AI through projects such as the AI Governance Decision Records pattern. ADRs are particularly valuable for explaining why the program chose a particular vendor, framework, or evaluation method months later.

Post-Mortems and Lessons Learned

Time-bound documents that capture what happened during an incident or project, what the team learned, and what the team will do differently. The Federal Aviation Administration’s Lessons Learned from Civil Aviation Accidents library at https://lessonslearned.faa.gov/ illustrates the long-term value of accumulating this kind of record across an industry; AI programs benefit from the same discipline within an organisation.

Patterns and Templates

Reusable problem-solution mappings: a template for stakeholder communication during model retirement, a pattern for handling foundation-model upgrades, a checklist for productionising a Generative AI feature. Patterns are the highest-leverage form of knowledge because they reduce the time to do the next instance.

Working Documents

Drafts, exploration notes, and meeting minutes. These have short half-lives and should expire on a defined schedule.

System Architecture

A workable knowledge management system has three layers.

Storage layer. The actual files: typically a mix of a wiki for narrative content (Confluence, Notion, GitHub Wiki, Google Workspace), version control for structured artefacts (model cards, datasheets, runbooks committed to repositories), and a document management system for formal records (SharePoint, M-Files, Box). The choice matters less than the consistency of contribution path.

Index and search layer. A unified search across all storage layers, with metadata facets (artefact type, owner, date, status, classification). Modern search platforms (Elastic, Algolia, Glean) combine keyword and semantic search to handle the variation in vocabulary across disciplines. The Linux Foundation’s OpenSearch project at https://opensearch.org/ provides an open-source baseline.

Discovery layer. Curated views for specific audiences and tasks. A model deployer needs a different view than a regulator preparing for an audit. Discovery should be opinionated; “search the wiki” is not a discovery layer.

Cultural Practices

The system architecture is necessary but not sufficient. Knowledge management is fundamentally cultural.

Capture is part of the work. A model is not done shipping until its model card is current. A project is not done until its lessons-learned record is filed. An incident is not closed until its post-mortem is published. Programs that treat documentation as separate from delivery generate documentation debt that compounds.

Review is part of the work. Every persistent document should have a review cadence — quarterly for operational documents, annually for reference documents. Documents that miss two consecutive reviews are auto-deprecated and surfaced for cleanup.

Reuse is rewarded. Practitioners who reuse a pattern should be encouraged to update the pattern with their experience, not silently fork it. Patterns that get reused frequently should be promoted in discoverability; patterns that have not been reused in a year should be re-examined.

Sources are cited. AI program documents should cite the source that informed them — a regulator publication, a vendor white paper, an internal incident. This makes the document auditable and helps the next reader follow the reasoning.

Authors are named. Anonymised documents are difficult to interrogate later. Naming the author also encourages quality: people write better when their name is attached.

Specific Knowledge Practices for AI

Several practices are distinctive to AI programs.

Foundation-model dependency log. A central register of every external model the program depends on, with version, deprecation status, evaluation results, and migration plan. The register should auto-update from procurement and procurement should auto-update from the register.

Prompt and prompt-template library. For Generative AI applications, prompts are code. They should be version-controlled, peer-reviewed, and documented with intent and known failure modes.

Evaluation-set library. The datasets used to evaluate models for fairness, robustness, and quality should be discoverable, with provenance and use restrictions documented (datasheets again).

Incident pattern library. Cross-incident analysis surfaces patterns: foundation-model upgrades cause output drift, retrieval pipeline failures cause hallucination spikes, prompt-injection campaigns cluster by attack vector. The pattern library converts individual incidents into program defences.

Vendor evaluation archive. Completed vendor evaluations (per Module 1.10) should be reusable when the same vendor is considered for a different use case, accelerating procurement while preserving rigor.

Common Failure Modes

The first is parallel knowledge — different teams maintain different copies of the same information, drifting apart over time. Counter with single-source-of-truth designation and aggressive deduplication.

The second is knowledge orphans — documents written by people who left the organisation and never re-attested. Counter with quarterly ownership review and automatic deprecation of orphaned documents older than a defined threshold.

The third is publication bias — only success stories get documented, leaving the program with no record of what was tried and abandoned. Counter by treating abandoned-experiment write-ups as a first-class deliverable with the same rigor as completed work.

The fourth is search failure — the documents exist but cannot be found. Counter by tagging discipline, semantic search, and periodic findability testing where reviewers attempt to locate documents and the search misses are tracked.

Looking Forward

The next article in Module 1.23 turns to the AI glossary specifically — the smallest, most-used, most-leveraged knowledge artefact a program produces. Building a shared vocabulary is the first knowledge management investment that pays back continuously.