Using k-fold cross-validation (kFCV) to predict the accuracy that a classifier will have on unseen data can be done reliably only in the absence of dataset shift, i.e., can be done when the training data and the unseen data are IID. However, there are many cases in which deploying a classifier “in the wild” means instead confronting “out-of-distribution data”, i.e., data affected by dataset shift; predicting the accuracy of a classifier on such data is an open problem. In this talk I will discuss the case in which the unlabelled data are affected by prior probability shift (aka “label shift”), an important (if not the most important) type of dataset shift. I will discuss a solution to the problem of classifier accuracy prediction that makes use of “quantification” algorithms robust to PPS, i.e., algorithms devised for estimating the relative frequencies of the classes in unseen data affected by PPS. I will present the results of systematic experiments in which the method is “stress-tested”, i.e., asked to predict the accuracy of a classifier on data samples in which variable amounts of prior probability shift have been artificially injected. I will show that the method gives rise to surprisingly small prediction error.
(Joint work with Lorenzo Volpi, Andrea Esuli, and Alejandro Moreo)