AI Fairness 360 - Resources

Welcome to AI Fairness 360

We hope you will use it and contribute to it to help engender trust in AI and make the world more equitable for all.

Machine learning models are increasingly used to inform high stakes decisions about people. Although machine learning, by its very nature, is always a form of statistical discrimination, the discrimination becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage. Biases in training data, due to either prejudice in labels or under-/over-sampling, yields models with unwanted bias [1].

The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. The AI Fairness 360 interactive demo provides a gentle introduction to the concepts and capabilities. The tutorials and other notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

Being a comprehensive set of capabilities, it may be confusing to figure out which metrics and algorithms are most appropriate for a given use case. To help, we have created some guidance material that can be consulted.

We have developed the package with extensibility in mind. We encourage the contribution of your metrics, explainers, and debiasing algorithms. Please join the community to get started as a contributor. The set of implemented metrics and algorithms includes ones described in the following list of papers:

Developer tutorials

The following tutorials provide different examples of detecting and mitigating bias. View them individually below or open the set of Jupyter notebooks in GitHub.

Credit scoring
Detecting and mitigating age bias on decisions to offer credit using the German Credit dataset

Medical expenditure
Detecting and mitigating racial bias in a care management scenario using Medical Expenditure Panel Survey data

Guidance on choosing metrics and mitigation

AI Fairness 360 (AIF360) includes many different metrics and algorithms, which may result in a daunting problem of making the right selection for a given application.  We provide some guidance to help.  To start, we ask whether AIF360 should be used at all.  Then we discuss the choice of metrics.  Finally, we discuss the choice of algorithms.

Appropriateness of Toolkit

Fairness is a multifaceted, context-dependent social construct that defies simple definition. The metrics and algorithms in AIF360 may be viewed from the lens of distributive justice [1], and clearly do not capture the full scope of fairness in all situations.  The toolkit should only be used in a very limited setting: allocation or risk assessment problems with well-defined protected attributes in which one would like to have some sort of statistical or mathematical notion of sameness.  Even then, the code and collateral contained in AIF360 is only a starting point to a broader discussion among multiple stakeholders on overall decision making workflows.


Even in the limited setting that AIF360 is suitable for, there are a large number of fairness metrics that may be appropriate for a given application [2].

Individual vs. Group Fairness, or Both

Group fairness, in its broadest sense, partitions a population into groups defined by protected attributes and seeks for some statistical measure to be equal across groups.  Individual fairness, in its broadest sense, seeks for similar individuals to be treated similarly.  If the application is concerned with individual fairness, then the metrics in the SampleDistortionMetric class should be used.  If the application is concerned with group fairness, then the metrics in the DatasetMetric class (and in its children classes such as the BinaryLabelDatasetMetric class) as well as the ClassificationMetric class (except the ones noted in the next sentence) should be used.  If the application is concerned with both individual and group fairness, and requires the use of a single metric, then the generalized entropy index and its specializations to Theil index and coefficient of variation in the ClassificationMetric class should be used.  Of course, multiple metrics, including ones from both individual and group fairness can be examined simultaneously.

Group Fairness: Data vs. Model

Fairness can be measured at different points in a machine learning pipeline: either on the training data or on the learned model, which also relates to the pre-processing, in-processing, and post-processing categories of bias mitigation algorithms [3] (see the Algorithms section for further discussion). If the application requires metrics on training data, the ones in the DatasetMetric class (and in its children classes such as the BinaryLabelDatasetMetric class) should be used. If the application requires metrics on models, the ones in the ClassificationMetric class should be used.

Group Fairness: We’re All Equal vs. What You See Is What You Get

There are two opposing worldviews on group fairness: we’re all equal (WAE) and what you see is what you get (WYSIWYG) [4],[5]. The WAE worldview holds that all groups have similar abilities with respect to the task (even if we cannot observe this properly), whereas the WYSIWYG worldview holds that the observations reflect ability with respect to the task.  For example in college admissions, using SAT score as a feature for predicting success in college, the WYSIWYG worldview says that the score correlates well with future success and that there is a way to use the score to correctly compare the abilities of applicants.  In contrast, the WAE worldview says that the SAT score may contain structural biases so its distribution being different across groups should not be mistaken for a difference in distribution in ability.

If the application follows the WAE worldview, then the demographic parity metrics should be used: disparate_impact and statistical_parity_difference.  If the application follows the WYSIWYG worldview, then the equality of odds metrics should be used: average_odds_difference and average_abs_odds_difference.  Other group fairness metrics (some are often labeled equality of opportunity) lie in-between the two worldviews and may be used appropriately: false_negative_rate_ratio, false_negative_rate_difference, false_positive_rate_ratio, false_positive_rate_difference, false_discovery_rate_ratio, false_discovery_rate_difference, false_omission_rate_ratio, false_omission_rate_difference, error_rate_ratio, and error_rate_difference.  To choose among these, the right side of the decision tree here may be consulted.

Group Fairness: Ratios vs. Differences

As can be observed from the lists of metrics above, AIF360 has both difference and ratio versions of metrics. Both convey the same information and the choice among them should be made based on the comfort of the users examining the results.


Bias mitigation algorithms attempt to improve the fairness metrics by modifying the training data, the learning algorithm, or the predictions. These algorithm categories are known as pre-processing, in-processing, and post-processing, respectively [3].


The choice among algorithm categories can partially be made based on the user persona's ability to intervene at different parts of a machine learning pipeline.  If the user is allowed to modify the training data, then pre-processing can be used.  If the user is allowed to change the learning algorithm, then in-processing can be used.  If the user can only treat the learned model as a black box without any ability to modify the training data or learning algorithm, then only post-processing can be used.  AIF360 recommends the earliest mediation category in the pipeline that the user has permission to apply because it gives the most flexibility and opportunity to correct bias as much as possible. If possible, all algorithms from all permissible categories should be tested because the ultimate performance depends on dataset characteristics: there is no one best algorithm independent of dataset [6].

Further Considerations

Among pre-processing algorithms, reweighing only changes weights applied to training samples; it does not change any feature or label values. Therefore, it may be a preferred option in case the application does not allow for value changes. Disparate impact remover and optimized pre-processing yield modified datasets in the same space as the input training data, whereas LFR’s pre-processed dataset is in a latent space.  If the application requires transparency on the transformation, then disparate impact remover and optimized pre-processing may be preferred options. Moreover, optimized pre-processing addresses both group fairness and individual fairness.

Among in-processing algorithms, the prejudice remover is limited to learning algorithms that allow for regularization terms whereas the adversarial debiasing algorithm allows for a more general set of learning algorithms, and may be preferred for that reason.

Among post-processing algorithms, the two equalized odds post-processing algorithms have a randomized component whereas the reject option algorithm is deterministic, and may be preferred for that reason.

The current AIF360 implementations of some algorithms take arguments on which fairness metric to optimize (e.g. optimized pre-processing and reject option) and some do not (e.g. disparate impact remover and equalized odds post-processing), which may imply better and worse performance by some algorithms with respect to some metrics.  The result of improving one fairness on other fairness metrics is complicated [7].


A systematic error. In the context of fairness, we are concerned with unwanted bias that places privileged groups at systematic advantage and unprivileged groups at systematic disadvantage.

Bias mitigation algorithm
A procedure for reducing unwanted bias in training data or models.

A model that predicts categorical labels from features.

Functionality for providing details on or causes for fairness metric results.

Fairness metric
A quantification of unwanted bias in training data or models.

Favorable label
A label whose value corresponds to an outcome that provides an advantage to the recipient. The opposite is an unfavorable lable.

An attribute containing information for predicting the label.

Group fairness
The goal of groups defined by protected attributes receiving similar treatments or outcomes.

In-processing algorithm
A bias mitigation algorithm that is applied to a model during its training.

Individual fairness
The goal of similar individuals receiving similar treatments or outcomes.

Instance weight
A numerical value that multiplies the contribution of a data point in a model.

A value corresponding to an outcome.

Machine learning
A general approach for determining models from data.

A function that takes features as input and predicts labels as output.

Post-processing algorithm
A bias mitigation algorithm that is applied to predicted labels.

Pre-processing algorithm
A bias mitigation algorithm that is applied to training data.

Privileged protected attribute
A value of a protected attribute indicating a group that has historically been at systematic advantage.

Protected attribute
An attribute that partitions a population into groups whose outcomes should have parity. Examples include race, gender, caste, and religion. Protected attributes are not universal, but are application specific.

A continuous value output from a classifier. Applying a threshold to a score results in a predicted label.

Training data
A dataset from which a model is learned.

A procedure that modifies a dataset.

AI Fairness and Explainability

To inspect model biases, a complementary approach to fairness metrics is to add transparency to the AI system through explainability. By directly exposing how the model makes its predictions, explanations can help people examine, identify, and ultimately correct biases and discrimination in machine learning models. When the model is unbiased, for example after applying the bias mitigation algorithms provided in this toolkit, effective explanation can assure people of the model fairness and foster trust. 

Research shows that people need a diverse set of explanation capabilities to fully scrutinize model biases.  For example, one may want to inspect if there is discrimination in the overall logic of the model. Others may want to ensure that they are not being unfairly treated by comparing the model's decisions for them to other individuals.

To learn more about the effectiveness and user preferences of explanation capabilities for supporting fairness judgment of machine learning models, read a recent paper:

Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K.E. Bellamy, and Casey Dugan
"Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment"
ACM International Conference on Intelligent User Interfaces, 2019

To learn more about AI Explainability and try state-of-the-art algorithms that provide a diverse set of explanation capabilities, visit IBM Research AI Explainability 360, an open source toolkit for interpretable machine learning.