×

Setting decision thresholds when operating conditions are uncertain. (English) Zbl 1458.68163

Summary: The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier’s scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

GitHub; RWeka; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adams N, Hand D (1999) Comparing classifiers when the misallocation costs are uncertain. Pattern Recognit 32(7):1139-1147 · doi:10.1016/S0031-3203(98)00154-X
[2] Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566-585 · doi:10.1007/s10489-012-0388-2
[3] Bishop C (2011) Embracing uncertainty: applied machine learning comes of age. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 4
[4] Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev 78(1):1-3 · doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
[5] Dalton LA (2016) Optimal ROC-based classification and performance analysis under Bayesian uncertainty models. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 13(4):719-729 · doi:10.1109/TCBB.2015.2465966
[6] de Melo C, Eduardo C, Bastos Cavalcante Prudencio R (2014) Cost-sensitive measures of algorithm similarity for meta-learning. In: 2014 Brazilian conference on intelligent systems (BRACIS). IEEE, pp 7-12
[7] Dou H, Yang X, Song X, Yu H, Wu WZ, Yang J (2016) Decision-theoretic rough set: a multicost strategy. Knowl-Based Syst 91:71-83 · doi:10.1016/j.knosys.2015.09.011
[8] Drummond C, Holte RC (2000) Explicitly representing expected cost: an alternative to roc representation. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, KDD ’00, pp 198-207
[9] Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95-130 · doi:10.1007/s10994-006-8199-5
[10] Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., IJCAI’01, pp 973-978
[11] Fawcett T (2003) In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2):140-148 · doi:10.1145/980972.980990
[12] Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861-874 · doi:10.1016/j.patrec.2005.10.010
[13] Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97-106 · Zbl 1470.62082 · doi:10.1007/s10994-007-5011-0
[14] Ferri C, Flach PA, Hernández-Orallo J (2017) R code for threshold choice methods with context uncertainty. https://github.com/ceferra/ThresholdChoiceMethods/tree/master/Uncertainty · Zbl 1436.62260
[15] Flach P (2004) The many faces of ROC analysis in machine learning. In: Proceedings of the twenty-first international conference on tutorial, machine learning (ICML 2004)
[16] Flach P (2014) Classification in context: adapting to changes in class and cost distribution. In: First international workshop on learning over multiple contexts at European conference on machine learning and principles and practice of knowledge discovery in databases ECML-PKDD’2014
[17] Flach P, Matsubara ET (2007) A simple lexicographic ranker and probability estimator. In: 18th European conference on machine learning, ECML2007. Springer, pp 575-582
[18] Flach P, Hernández-Orallo J, Ferri C (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th international conference on machine learning, ICML2011
[19] Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206-10222 · doi:10.1016/j.eswa.2009.02.037
[20] Hand D (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103-123 · Zbl 1470.62085 · doi:10.1007/s10994-009-5119-5
[21] Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings of the 28th international conference on machine learning, ICML2011
[22] Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813-2869 · Zbl 1436.62260
[23] Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71-91 · Zbl 1293.68225 · doi:10.1007/s10994-013-5328-9
[24] Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225-232 · Zbl 1232.62007 · doi:10.1007/s00180-008-0119-7
[25] Huang Y (2015) Dynamic cost-sensitive naive bayes classification for uncertain data. Int J Database Theory Appl 8(1):271-280 · doi:10.14257/ijdta.2015.8.1.26
[26] Johnson RA, Raeder T, Chawla NV (2015) Optimizing classifiers for hypothetical scenarios. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 264-276
[27] Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
[28] Liu M, Zhang Y, Zhang X, Wang Y (2011) Cost-sensitive decision tree for uncertain data. In: Advanced data mining and applications. Springer, Berlin, pp 243-255
[29] Liu XY, Zhou ZH (2010) Learning with cost intervals. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 403-412
[30] Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203-231 · Zbl 0969.68126 · doi:10.1023/A:1007601015854
[31] Provost FJ, Fawcett T et al (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. KDD 97:43-48
[32] Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 4-15
[33] Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D (2009) Naive Bayes classification of uncertain data. In: Ninth IEEE international conference on data mining, 2009. ICDM’09. IEEE, pp 944-949
[34] Ridzuan, F.; Potdar, V.; Talevski, A.; Taniar, D. (ed.); Gervasi, O. (ed.); Murgante, B. (ed.); Pardede, E. (ed.); Apduhan, BO (ed.), Factors involved in estimating cost of email spam, 383-399 (2010), Berlin · doi:10.1007/978-3-642-12165-4_31
[35] Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2003) A memory-based approach to anti-spam filtering for mailing lists. Inf Retr 6(1):49-73 · doi:10.1023/A:1022948414856
[36] Tsang S, Kao B, Yip KY, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64-78 · doi:10.1109/TKDE.2009.175
[37] Wang R, Tang K (2012) Minimax classifier for uncertain costs. arXiv preprint arXiv:1205.0406
[38] Zadrozny B, Elkan C (2001a) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 204-213
[39] Zadrozny B, Elkan C (2001b) Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), pp 609-616
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.