×

zbMATH — the first resource for mathematics

Fair data adaptation with quantile preservation. (English) Zbl 07306921
Summary: Fairness of classification and regression has received much attention recently and various, partially non-compatible, criteria have been proposed. The fairness criteria can be enforced for a given classifier or, alternatively, the data can be adapted to ensure that every classifier trained on the data will adhere to desired fairness criteria. We present a practical data adaption method based on quantile preservation in causal structural equation models. The data adaptation is based on a presumed counterfactual model for the data. While the counterfactual model itself cannot be verified experimentally, we show that certain population notions of fairness are still guaranteed even if the counterfactual model is misspecified. The nature of the fulfilled observational non-causal fairness notion (such as demographic parity, separation or sufficiency) depends on the structure of the underlying causal model and the choice of resolving variables. We describe an implementation of the proposed data adaptation procedure based on Random Forests and demonstrate its practical use on simulated and real-world data.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
Software:
AIF360; ranger; quantreg; CRAN
PDF BibTeX XML Cite
Full Text: Link
References:
[1] Alekh Agarwal, Alina Beygelzimer, Miroslav Dud´ık, John Langford, and Hanna Wallach. A reductions approach to fair classification.arXiv preprint arXiv:1803.02453, 2018.
[2] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias.arXiv preprint arXiv:1810.01943, 2018.
[3] Peter J Bickel, Eugene A Hammel, and J William O’Connell. Sex bias in graduate admissions: Data from berkeley.Science, 187(4175):398-404, 1975.
[4] Francine D Blau and Lawrence M Kahn. Understanding international differences in the gender pay gap.Journal of Labor economics, 21(1):106-144, 2003.
[5] Leo Breiman. Random forests.Machine learning, 45(1):5-32, 2001.
[6] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. InAdvances in Neural Information Processing Systems, pages 3992-4001, 2017.
[7] Alex J Cannon. Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes.Stochastic environmental research and risk assessment, 32(11):3207-3225, 2018.
[8] Guillaume Carlier, Victor Chernozhukov, Alfred Galichon, et al. Vector quantile regression: an optimal transport approach.The Annals of Statistics, 44(3):1165-1192, 2016.
[9] Silvia Chiappa. Path-specific counterfactual fairness. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7801-7808, 2019.
[10] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153-163, 2017.
[11] Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning.arXiv preprint arXiv:1808.00023, 2018.
[12] Juan A Cuesta-Albertos, L Ruschendorf, and Araceli Tuerodiaz. Optimal coupling of multivariate distributions and stochastic processes.Journal of Multivariate Analysis, 46(2): 335-361, 1993.
[13] Richard B Darlington. Another look at “cultural fairness” 1.Journal of Educational Measurement, 8(2):71-82, 1971.
[14] A Philip Dawid. Causal inference without counterfactuals.Journal of the American Statistical Association, 95(450):407-424, 2000.
[15] Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. Empirical risk minimization under fairness constraints. InAdvances in Neural Information Processing Systems, pages 2791-2801, 2018.
[16] Benjamin Fish, Jeremy Kun, and ´Ad´am D Lelkes. A confidence-based approach for balancing fairness and accuracy. InProceedings of the 2016 SIAM International Conference on Data Mining, pages 144-152. SIAM, 2016.
[17] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. InAdvances in neural information processing systems, pages 3315-3323, 2016.
[18] Lily Hu and Yiling Chen. A short-term intervention for long-term fairness in the labor market. InProceedings of the 2018 World Wide Web Conference, pages 1389-1398, 2018.
[19] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination.Knowledge and Information Systems, 33(1):1-33, 2012.
[20] Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream effects of affirmative action. InProceedings of the Conference on Fairness, Accountability, and Transparency, pages 240-248, 2019.
[21] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu.Preventing fairness gerrymandering:Auditing and learning for subgroup fairness.arXiv preprint arXiv:1711.05144, 2017.
[22] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch¨olkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656-666, 2017.
[23] Roger Koenker.quantreg: Quantile regression. r package version 5.05.R Foundation for Statistical Computing: Vienna) Available at: http://CRAN. R-project. org/package= quantreg, 2013.
[24] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. InAdvances in Neural Information Processing Systems, pages 4066-4076, 2017.
[25] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm.ProPublica (5 2016), 9, 2016.
[26] Moshe Lichman et al. Uci machine learning repository.https://archive.ics.uci.edu/ml, 2013.
[27] Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machine learning.arXiv preprint arXiv:1803.04383, 2018.
[28] James D Malley, Jochen Kruppa, Abhijit Dasgupta, Karen G Malley, and Andreas Ziegler. Probability machines.Methods of information in medicine, 51(01):74-81, 2012.
[29] Nicolai Meinshausen. Quantile regression forests.Journal of Machine Learning Research, 7(Jun):983-999, 2006.
[30] Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. The social cost of strategic classification. InProceedings of the Conference on Fairness, Accountability, and Transparency, pages 230-239, 2019.
[31] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. InThirty-Second AAAI Conference on Artificial Intelligence, 2018.
[32] Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. Obtaining well calibrated probabilities using bayesian binning. InTwenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[33] Judea Pearl. The logic of counterfactuals in causal inference.Journal of the American Statistical Association, 95(450):428-435, 2000.
[34] Judea Pearl.Causality. Cambridge University press, 2009.
[35] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. InAdvances in Neural Information Processing Systems, pages 5680-5689, 2017.
[36] Thomas S Richardson and James M Robins. Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality.Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper, 128 (30):2013, 2013.
[37] Filippo Santambrogio. Optimal transport for applied mathematicians.Birk¨auser, NY, 55: 58-63, 2015.
[38] Ilya Shpitser. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding.Cognitive science, 37(6):1011-1035, 2013.
[39] Ilya Shpitser and Judea Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(Sep):1941-1979, 2008.
[40] Jin Tian and Judea Pearl. A general identification condition for causal effects. InAaai/iaai, pages 567-573, 2002.
[41] Marvin N Wright and Andreas Ziegler. ranger: A fast implementation of random forests for high dimensional data in c++ and r.arXiv preprint arXiv:1508.04409, 2015.
[42] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. InProceedings of the 26th International Conference on World Wide Web, pages 1171-1180. International World Wide Web Conferences Steering Committee, 2017.
[43] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. InInternational Conference on Machine Learning, pages 325-333, 2013.
[44] Junzhe Zhang and Elias Bareinboim. Equality of opportunity in classification: A causal approach.InAdvances in Neural Information Processing Systems, pages 3675-3685, 2018a.
[45] Junzhe Zhang and Elias Bareinboim. Fairness in decision-making—the causal explanation formula. InThirty-Second AAAI Conference on Artificial Intelligence, 2018b.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.