Fairness in AI: Definitions, Metrics, and Implementation Tradeoffs

FlowRidge

Definition

Fairness in artificial intelligence (AI) is the property that a system’s outputs do not unjustly advantage or disadvantage individuals or groups on the basis of protected characteristics — typically race, gender, age, disability, religion, sexual orientation, or other attributes recognized by law or shared moral conviction. Fairness is not a single quantity. It is a family of mathematically distinct definitions, each capturing a different intuition about what “treating people fairly” means. Crucially, the major definitions cannot all be satisfied simultaneously except in trivial cases — a result known as the impossibility theorem of fair classification. This article gives practitioners the conceptual tools to choose among definitions explicitly, document the choice, and defend it.

The Sources of Unfairness

Unfairness enters AI systems through at least five distinct mechanisms, and the choice of mitigation depends on which mechanism is dominant in a given case.

Historical bias arises when training data reflects past discrimination. A hiring model trained on twenty years of resumes from a male-dominated profession will learn that male candidates were historically successful and may perpetuate that pattern even if the present-day labor pool is balanced.

Representation bias arises when data collection systematically under-samples some groups. Facial recognition systems trained primarily on light-skinned faces have well-documented higher error rates on darker-skinned faces — a result first quantified at scale by Buolamwini and Gebru in the 2018 Gender Shades study.

Measurement bias arises when the same construct is measured differently across groups. Standardized tests that have different predictive validity for different demographic groups, or medical risk scores that use healthcare spending as a proxy for health (which systematically disadvantages groups with less access to care), are well-known examples.

Aggregation bias arises when a single model is fit to a heterogeneous population for which different sub-populations follow different patterns. A diabetes risk model fitted across all ethnicities may perform poorly on each individual ethnicity even if it performs adequately on average.

Deployment bias arises when a model is used in a context that differs from the one it was trained for — for instance, a triage model trained on emergency room data deployed in a primary care setting.

The five sources demand different responses. Historical and representation biases are addressed primarily through data-side interventions. Measurement bias requires reconsidering the target variable. Aggregation bias may require segmented models. Deployment bias requires governance discipline at the use-case approval gate.

Formal Definitions of Fairness

The fair-machine-learning literature has converged on three families of group-level fairness definitions, each capturing a different normative intuition.

Demographic parity (also called statistical parity). The probability of a positive outcome should be equal across protected groups. Formally, P(prediction = 1 | group = A) = P(prediction = 1 | group = B). A loan model satisfies demographic parity if it approves the same percentage of applicants regardless of race or gender. This definition aligns with the legal doctrine of disparate impact in US employment law.

Equality of opportunity. Among people who would actually succeed (the “positive class” in the ground truth), the probability of being correctly identified should be equal across groups. Formally, P(prediction = 1 | outcome = 1, group = A) = P(prediction = 1 | outcome = 1, group = B). A hiring model satisfies equality of opportunity if equally qualified candidates are equally likely to be hired regardless of group membership.

Predictive parity (also called calibration within groups). Among people predicted to be positive, the actual rate of true positives should be equal across groups. Formally, P(outcome = 1 | prediction = 1, group = A) = P(outcome = 1 | prediction = 1, group = B). A risk score satisfies predictive parity if a “high-risk” label means the same probability of the outcome regardless of group.

These definitions sound similar but capture different commitments. Demographic parity is concerned with equality of outcomes; equality of opportunity with equality of access conditional on merit; predictive parity with consistent meaning of model outputs across groups.

The Impossibility Theorem

In 2016, three independent papers — Chouldechova, Kleinberg-Mullainathan-Raghavan, and Berk et al. — proved that no classifier can simultaneously satisfy demographic parity, equality of opportunity, and predictive parity unless either the base rates of the outcome are identical across groups or the classifier is perfect. In real-world applications, base rates differ across groups for many reasons (some legitimate, some reflecting historical injustice), and no classifier is perfect. Therefore, the choice of fairness definition is unavoidable.

The famous case study is the COMPAS recidivism risk score, which ProPublica reported in 2016 had higher false-positive rates for Black defendants than white defendants. Northpointe (the vendor) replied that COMPAS satisfied predictive parity — that is, a “high-risk” score meant the same recidivism probability regardless of race. Both claims were correct simultaneously. The disagreement was not about the math but about which fairness definition the system should have prioritized.

The impossibility theorem has three implications for practitioners. First, choosing a fairness definition is a normative decision, not a technical one — and therefore belongs to the ethics review process, not to the data science team. Second, the choice must be documented and justified in language that a non-technical stakeholder can understand. Third, the choice should be revisited when context changes (for example, when a model designed for a screening use case is repurposed for a final-decision use case).

Individual Fairness

Group-level definitions can be satisfied while individuals within a group are treated arbitrarily. Individual fairness is the principle that “similar individuals should be treated similarly” — formally, that the model’s outputs should change smoothly as the inputs change. Individual fairness is harder to operationalize because it requires defining a similarity metric, which itself encodes value judgments. Recent work on counterfactual fairness — “would the model have made the same decision if a protected attribute had been different, holding everything causally downstream constant?” — provides one operationalization but requires a causal model that is rarely available.

In practice, most enterprise AI ethics programs commit to a primary group-level definition and supplement it with individual-level audits on a sample basis.

Implementation Tradeoffs

Choosing a fairness definition is the first decision; implementing it is the second. Implementations cluster into three families.

Pre-processing modifies the training data — for example, by reweighting samples, generating synthetic counterfactual examples, or removing protected attributes (with care, because correlated proxy variables typically remain). Pre-processing is preferred when the development team controls the data pipeline and can document changes for audit.

In-processing modifies the training algorithm itself, typically by adding fairness constraints to the loss function. This produces models that achieve the chosen definition by construction but may sacrifice accuracy and may behave unexpectedly on unseen data.

Post-processing modifies the model’s outputs — for example, by adjusting decision thresholds separately for each group. Post-processing is straightforward to implement and audit but is legally controversial in some jurisdictions because it makes the protected attribute an explicit input to the decision.

Each approach has measurable accuracy cost. The Aequitas project has published systematic benchmarks showing that demographic parity typically costs 1–10% in raw accuracy depending on the dataset and the underlying base-rate disparity. These costs must be transparently reported to decision-makers, not buried in technical appendices.

The Regulatory Layer

Fairness is no longer purely an ethics question; it is increasingly a legal one. The EU AI Act, which entered into force in August 2024, requires high-risk AI systems to undergo conformity assessments that include fairness analysis. The Algorithmic Accountability Act introduced in the US Congress in 2023 (H.R. 5628) would require impact assessments covering bias, fairness, and discrimination for “augmented critical decision processes.” See https://www.congress.gov/bill/118th-congress/house-bill/5628.

International standards bodies are also producing fairness-specific guidance. The Singapore IMDA Model AI Governance Framework includes detailed fairness requirements scaled by use-case risk; see https://www.pdpc.gov.sg/help-and-resources/2020/01/model-ai-governance-framework. The NIST AI Risk Management Framework includes Bias and Fairness as a core measurement domain; see https://www.nist.gov/itl/ai-risk-management-framework.

Practitioners should treat fairness work as both an ethical commitment and a regulatory compliance requirement. The two are mutually reinforcing — the documentation produced for ethical review typically satisfies regulatory evidence requirements as well.

Maturity Indicators

A maturing fairness practice exhibits the following progression of capability:

Level 1 (Foundational): No fairness analysis is conducted on any model.
Level 2 (Developing): Fairness is discussed in policy but not measured.
Level 3 (Defined): Fairness metrics are defined for high-risk use cases, calculated at launch, and documented in the model card.
Level 4 (Advanced): Fairness metrics are calculated automatically in the build pipeline, monitored in production, and tied to alerting thresholds. Fairness definition choices are documented per model with normative justification.
Level 5 (Transformational): Fairness performance is published in transparency reports and is part of the organization’s external positioning. The organization contributes to industry fairness standards.

The leap from Level 2 to Level 3 is the hardest. It requires the data science team, the ethics function, and the product owner to converge on a fairness definition for each use case — a conversation that many organizations defer until a regulator or journalist forces it.

Practical Application

Three concrete steps initiate a fairness practice. First, inventory all production AI systems and classify each as high, medium, or low risk based on the consequences of an unfair decision for an affected individual. Second, for each high-risk system, hold an ethics review meeting that ends with a written choice of fairness definition, the metrics that will be reported, and the threshold at which intervention is required. Third, instrument the build pipeline to compute the chosen metrics on every model release and store them alongside accuracy metrics.

The Partnership on AI’s About ML project publishes templates for documenting these choices and is a useful starting point; see https://partnershiponai.org/. The IBM AI Fairness 360 toolkit and the Microsoft Fairlearn library implement most of the metrics and mitigation algorithms described above and are widely used in enterprise practice.

Looking Ahead

This article has provided the conceptual framework for fairness. Article 3 — M1.11Algorithmic Bias: Detection, Mitigation, and Continuous Monitoring — turns to the practical work of finding bias in models that are already built and keeping it out of models that are being built. The two articles together equip a practitioner to engage substantively with the most contested ethical question in deployed AI.