AI Fairness 360 - Resources
Welcome to IBM Research AI Fairness 360
We hope you will use it and contribute to it to help engender trust in AI and make the world more equitable for all.
Machine learning models are increasingly used to inform high stakes decisions about people. Although machine learning, by its very nature, is always a form of statistical discrimination, the discrimination becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage. Biases in training data, due to either prejudice in labels or under-/over-sampling, yields models with unwanted bias .
The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. The AI Fairness 360 interactive demo provides a gentle introduction to the concepts and capabilities. The tutorials and other notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.
Being a comprehensive set of capabilities, it may be confusing to figure out which metrics and algorithms are most appropriate for a given use case. To help, we have created some guidance material that can be consulted.
We have developed the package with extensibility in mind. We encourage the contribution of your metrics, explainers, and debiasing algorithms. Please join the community to get started as a contributor. The set of implemented metrics and algorithms includes ones described in the following list of papers:
- Flavio P. Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R. Varshney, “Optimized Pre-Processing for Discrimination Prevention”, Conference on Neural Information Processing Systems, 2017.
- Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth Vishnoi, “Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees”, 2018
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian, “Certifying and Removing Disparate Impact”, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
- Moritz Hardt, Eric Price, and Nathan Srebro, “Equality of Opportunity in Supervised Learning”, Conference on Neural Information Processing Systems, 2016.
- Faisal Kamiran and Toon Calders, “Data Preprocessing Techniques for Classification without Discrimination”, Knowledge and Information Systems, 2012.
- Faisal Kamiran, Asim Karim, and Xiangliang Zhang, “Decision Theory for Discrimination-Aware Classification”, IEEE International Conference on Data Mining, 2012.
- Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma, “Fairness-Aware Classifier with Prejudice Remover Regularizer”, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2012.
- Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger, “On Fairness and Calibration”, Conference on Neural Information Processing Systems, 2017.
- Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P. Gummadi, Adish Singla, Adrian Weller, and Muhammad Bilal Zafar, “A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices”, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018.
- Richard Zemel, Yu (Ledell) Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork, “Learning Fair Representations”, International Conference on Machine Learning, 2013.
- Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell, “Mitigating Unwanted Biases with Adversarial Learning”, AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.
The following tutorials provide different examples of detecting and mitigating bias. View them individually below or open the set of Jupyter notebooks in GitHub.
Detecting and mitigating age bias on decisions to offer credit using the German Credit dataset
Detecting and mitigating racial bias in a care management scenario using Medical Expenditure Panel Survey data
Gender classification of face images
Detecting and mitigating bias in automatic gender classification of face images
Guidance on choosing metrics and mitigation
AI Fairness 360 (AIF360) includes many different metrics and algorithms, which may result in a daunting problem of making the right selection for a given application. We provide some guidance to help. To start, we ask whether AIF360 should be used at all. Then we discuss the choice of metrics. Finally, we discuss the choice of algorithms.
Appropriateness of Toolkit
Fairness is a multifaceted, context-dependent social construct that defies simple definition. The metrics and algorithms in AIF360 may be viewed from the lens of distributive justice , and clearly do not capture the full scope of fairness in all situations. The toolkit should only be used in a very limited setting: allocation or risk assessment problems with well-defined protected attributes in which one would like to have some sort of statistical or mathematical notion of sameness. Even then, the code and collateral contained in AIF360 is only a starting point to a broader discussion among multiple stakeholders on overall decision making workflows.
Even in the limited setting that AIF360 is suitable for, there are a large number of fairness metrics that may be appropriate for a given application .
Individual vs. Group Fairness, or Both
Group fairness, in its broadest sense, partitions a population into groups defined by protected attributes and seeks for some statistical measure to be equal across groups. Individual fairness, in its broadest sense, seeks for similar individuals to be treated similarly. If the application is concerned with individual fairness, then the metrics in the SampleDistortionMetric class should be used. If the application is concerned with group fairness, then the metrics in the DatasetMetric class (and in its children classes such as the BinaryLabelDatasetMetric class) as well as the ClassificationMetric class (except the ones noted in the next sentence) should be used. If the application is concerned with both individual and group fairness, and requires the use of a single metric, then the generalized entropy index and its specializations to Theil index and coefficient of variation in the ClassificationMetric class should be used. Of course, multiple metrics, including ones from both individual and group fairness can be examined simultaneously.
Group Fairness: Data vs. Model
Fairness can be measured at different points in a machine learning pipeline: either on the training data or on the learned model, which also relates to the pre-processing, in-processing, and post-processing categories of bias mitigation algorithms  (see the Algorithms section for further discussion). If the application requires metrics on training data, the ones in the DatasetMetric class (and in its children classes such as the BinaryLabelDatasetMetric class) should be used. If the application requires metrics on models, the ones in the ClassificationMetric class should be used.
Group Fairness: We’re All Equal vs. What You See Is What You Get
There are two opposing worldviews on group fairness: we’re all equal (WAE) and what you see is what you get (WYSIWYG) ,. The WAE worldview holds that all groups have similar abilities with respect to the task (even if we cannot observe this properly), whereas the WYSIWYG worldview holds that the observations reflect ability with respect to the task. For example in college admissions, using SAT score as a feature for predicting success in college, the WYSIWYG worldview says that the score correlates well with future success and that there is a way to use the score to correctly compare the abilities of applicants. In contrast, the WAE worldview says that the SAT score may contain structural biases so its distribution being different across groups should not be mistaken for a difference in distribution in ability.
If the application follows the WAE worldview, then the demographic parity metrics should be used: disparate_impact and statistical_parity_difference. If the application follows the WYSIWYG worldview, then the equality of odds metrics should be used: average_odds_difference and average_abs_odds_difference. Other group fairness metrics (some are often labeled equality of opportunity) lie in-between the two worldviews and may be used appropriately: false_negative_rate_ratio, false_negative_rate_difference, false_positive_rate_ratio, false_positive_rate_difference, false_discovery_rate_ratio, false_discovery_rate_difference, false_omission_rate_ratio, false_omission_rate_difference, error_rate_ratio, and error_rate_difference. To choose among these, the right side of the decision tree here may be consulted.
Group Fairness: Ratios vs. Differences
As can be observed from the lists of metrics above, AIF360 has both difference and ratio versions of metrics. Both convey the same information and the choice among them should be made based on the comfort of the users examining the results.
Bias mitigation algorithms attempt to improve the fairness metrics by modifying the training data, the learning algorithm, or the predictions. These algorithm categories are known as pre-processing, in-processing, and post-processing, respectively .
The choice among algorithm categories can partially be made based on the user persona's ability to intervene at different parts of a machine learning pipeline. If the user is allowed to modify the training data, then pre-processing can be used. If the user is allowed to change the learning algorithm, then in-processing can be used. If the user can only treat the learned model as a black box without any ability to modify the training data or learning algorithm, then only post-processing can be used. AIF360 recommends the earliest mediation category in the pipeline that the user has permission to apply because it gives the most flexibility and opportunity to correct bias as much as possible. If possible, all algorithms from all permissible categories should be tested because the ultimate performance depends on dataset characteristics: there is no one best algorithm independent of dataset .
Among pre-processing algorithms, reweighing only changes weights applied to training samples; it does not change any feature or label values. Therefore, it may be a preferred option in case the application does not allow for value changes. Disparate impact remover and optimized pre-processing yield modified datasets in the same space as the input training data, whereas LFR’s pre-processed dataset is in a latent space. If the application requires transparency on the transformation, then disparate impact remover and optimized pre-processing may be preferred options. Moreover, optimized pre-processing addresses both group fairness and individual fairness.
Among in-processing algorithms, the prejudice remover is limited to learning algorithms that allow for regularization terms whereas the adversarial debiasing algorithm allows for a more general set of learning algorithms, and may be preferred for that reason.
Among post-processing algorithms, the two equalized odds post-processing algorithms have a randomized component whereas the reject option algorithm is deterministic, and may be preferred for that reason.
The current AIF360 implementations of some algorithms take arguments on which fairness metric to optimize (e.g. optimized pre-processing and reject option) and some do not (e.g. disparate impact remover and equalized odds post-processing), which may imply better and worse performance by some algorithms with respect to some metrics. The result of improving one fairness on other fairness metrics is complicated .
A systematic error. In the context of fairness, we are concerned with unwanted bias that places privileged groups at systematic advantage and unprivileged groups at systematic disadvantage.
Bias mitigation algorithm
A procedure for reducing unwanted bias in training data or models.
A model that predicts categorical labels from features.
Functionality for providing details on or causes for fairness metric results.
A quantification of unwanted bias in training data or models.
A label whose value corresponds to an outcome that provides an advantage to the recipient. The opposite is an unfavorable lable.
An attribute containing information for predicting the label.
The goal of groups defined by protected attributes receiving similar treatments or outcomes.
A bias mitigation algorithm that is applied to a model during its training.
The goal of similar individuals receiving similar treatments or outcomes.
A numerical value that multiplies the contribution of a data point in a model.
A value corresponding to an outcome.
A general approach for determining models from data.
A function that takes features as input and predicts labels as output.
A bias mitigation algorithm that is applied to predicted labels.
A bias mitigation algorithm that is applied to training data.
Privileged protected attribute
A value of a protected attribute indicating a group that has historically been at systematic advantage.
An attribute that partitions a population into groups whose outcomes should have parity. Examples include race, gender, caste, and religion. Protected attributes are not universal, but are application specific.
A continuous value output from a classifier. Applying a threshold to a score results in a predicted label.
A dataset from which a model is learned.
A procedure that modifies a dataset.