×

Large-scale kernel methods for independence testing. (English) Zbl 1384.62154

Summary: Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable trade-off between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nyström and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our large-scale methods give comparable performance with existing methods while using significantly less computation time and memory.

MSC:

62G10 Nonparametric hypothesis testing
62H20 Measures of association (correlation, canonical correlation, etc.)
68T05 Learning and adaptive systems in artificial intelligence

Software:

FastMMD
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Anderson, N.H., Hall, P., Titterington, D.M.: Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivar. Anal. 50, 41-54 (1994) · Zbl 0798.62055 · doi:10.1006/jmva.1994.1033
[2] Arcones, M.A., Gine, E.: On the bootstrap of \[UU\] and \[VV\] statistics. Ann. Stat. 20(2), 655-674 (1992) · Zbl 0760.62018 · doi:10.1214/aos/1176348650
[3] Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337-404 (1950) · Zbl 0037.20701 · doi:10.1090/S0002-9947-1950-0051437-7
[4] Bach, F., Jordan, M.I.: Kernel independent component analysis. J. Mach.Learn. 10, 1-48 (2002) · Zbl 1088.68689
[5] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Boston (2004) · Zbl 1145.62002 · doi:10.1007/978-1-4419-9096-9
[6] Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere., P.: The million song dataset. In: International Conference on Music Information Retrieval (ISMIR) (2011) · Zbl 1160.62059
[7] Blaschko, M., Gretton, A.: Learning taxonomies by dependence maximization. Adv. Neural Inf. Process. Syst. 21, 153-160 (2009)
[8] Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schlkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), e49-e57 (2006). doi:10.1093/bioinformatics/btl242 · doi:10.1093/bioinformatics/btl242
[9] Cho, Y.; Saul, LK; Bengio, Y. (ed.); Schuurmans, D. (ed.); Lafferty, JD (ed.); Williams, CKI (ed.); Culotta, A. (ed.), Kernel methods for deep learning, No. 22, 342-350 (2009), Red Hook
[10] Chwialkowski, K., Gretton, A.: A kernel independence test for random processes. In: Proceedings of the 31st International Conference on Machine Learning (2014) · Zbl 0798.62055
[11] Chwialkowski, K., Ramdas, A., Sejdinovic, D., Gretton, A.: Fast two-sample testing with analytic representations of probability measures. In: Advances in Neural Information Processing Systems (NIPS), vol. 28 (2015)
[12] Cortes, C., Mohri, M., Rostamizadeh, A.: Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 13(1), 795-828 (2012) · Zbl 1283.68286
[13] Dauxois, J., Nkiet, G.M.: Nonlinear canonical analysis and independence tests. Ann. Stat. 26(4), 1254-1278 (1998) · Zbl 0934.62061 · doi:10.1214/aos/1024691242
[14] Flaxman, S.R., Neill, D.B., Smola, A.J.: Gaussian processes for independence tests with non-iid data in causal inference. ACM Trans. Intell. Syst. Technol. 7(2), 22:1-22:23 (2015) · doi:10.1145/2806892
[15] Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: In Adv. NIPS (2008)
[16] Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring Statistical Dependence with Hilbert-Schmidt Norms. Lecture Notes in Computer Science, pp. 63-77 (2005) · Zbl 1168.62354
[17] Gretton, A., Fukumizu, K., Schölkopf, B., Teo, C.H., Song, L., Smola, A.J.: A kernel statistical test of independence. In: NIPS (2008)
[18] Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.: A fast, consistent kernel two-sample test. In: Advances in Neural Information Processing Systems, vol. 22. Curran Associates Inc., Red Hook, pp. 673-681 (2009). papers/1006_paperlong.pdf · Zbl 1283.68286
[19] Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishman, S., Pontil, M., Fukumizu, K.: Optimal kernel choice for large-scale two-sample tests. In: Advances in Neural Information Processing Systems (2012a) · Zbl 0037.20701
[20] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723-773 (2012b) · Zbl 1283.62095
[21] Huang, S.Y., Lee, M.H., Hsiao, C.K.: Nonlinear measures of association with kernel canonical correlation analysis and applications. J. Stat. Plan. Inference 139(7), 2162-2174 (2009) · Zbl 1160.62059 · doi:10.1016/j.jspi.2008.10.011
[22] Jitkrittum, W., Szabo, Z., Gretton, A.: An Adaptive Test of Independence with Analytic Kernel Embeddings (2016)
[23] Lai, P., Fyfe, C.: Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst. 10(5), 365-377 (2000) · doi:10.1142/S012906570000034X
[24] Lopez-Paz, D.: From Dependence to Causation. Ph.D. thesis, University of Cambridge (2016)
[25] Lopez-Paz, D., Hennig, P., Schölkopf, B.: The randomized dependence coefficient. Adv. Neural Inf. Process. Syst. 26, 1-9 (2013)
[26] Lopez-Paz, D., Sra, S., Smola, A., Ghahramani, Z., Schölkopf, B.: Randomized nonlinear component analysis. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1359-1367 (2014)
[27] Lyons, R.: Distance covariance in metric spaces. Ann. Probab. 41(5), 3284-3305 (2013) · Zbl 1292.62087 · doi:10.1214/12-AOP803
[28] Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, New York (1979) · Zbl 0432.62029
[29] Nguyen, D., Eisenstein, J.: A Kernel Independence Test for Geographical Language Variation (2016). ArXiv:1601.06579 · Zbl 0760.62018
[30] Peters, J., Mooij, J.M., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15(1), 2009-2053 (2014) · Zbl 1318.68151
[31] Póczos, B., Ghahramani, Z., Schneider, J.G.: Copula-based kernel dependency measures. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26-July 1, 2012 (2012)
[32] Rahimi, A., Recht, B.: Random Features for Large-Scale Kernel Machines. Adv. Neural Inf. Process. Syst. 20 (2007) · Zbl 1283.62095
[33] Reed, M., Simon, B.: Methods of Modern Mathematical Physics. I: Functional Analysis, 2nd edn. Academic Press, New York (1980) · Zbl 0459.46001
[34] Rubenstein, P.K., Chwialkowski, K.P., Gretton, A.: A kernel test for three-variable interactions with random processes. arXiv preprint (2015). ArXiv:1603.00929 · Zbl 0934.62061
[35] Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
[36] Sejdinovic, D., Gretton, A., Bergsma, W.: A kernel test for three-variable interactions. Adv. Neural Inf. Process. Syst. (NIPS) 26, 1124-1132 (2013a)
[37] Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41(5), 2263-2291 (2013b) · Zbl 1281.62117 · doi:10.1214/13-AOS1140
[38] Sejdinovic, D., Strathmann, H., De, S., Zaremba, W., Blaschko, M., Gretton., A.: Big Hypothesis Tests with Kernel Embeddings: An Overview. Technical Report (2014). Gatsby Unit, UCL
[39] Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (2002) · Zbl 1001.62005
[40] Smola, A., Gretton, A., Song, L., Schölkop, B.: A Hilbert space embedding for distributions. In: Algorithmic Learning Theory: 18th International Conference, pp. 13-31 (2007) · Zbl 1142.68407
[41] Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-inputs. In: Advances in Neural Information Processing Systems, vol. 18, pp. 1257-1264. MIT press (2006)
[42] Song, L., Smola, A., Gretton, A., Borgwardt, K.M.: A dependence maximization view of clustering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 815-822 (2007)
[43] Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependence maximization. J. Mach. Learn. Res. 13, 1393-1434 (2012). http://jmlr.csail.mit.edu/papers/v13/song12a.html · Zbl 1303.68110
[44] Sriperumbudur, B.K.: Reproducing Kernel Space Embeddings and Metrics on Probability Measures. PhD Thesis, University of California, San Diego (2010)
[45] Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008) · Zbl 1203.68171
[46] Sutherland, D.J., Schneider, J.: On the error of random fourier features. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2015) · Zbl 1088.68689
[47] Székely, G.J., Rizzo, M.L.: Brownian distance covariance. Ann. Appl. Stat. 3(4), 1236-1265 (2009) · Zbl 1196.62077 · doi:10.1214/09-AOAS312
[48] Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35(6), 2769-2794 (2007) · Zbl 1129.62059 · doi:10.1214/009053607000000505
[49] Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2005) · Zbl 1075.65021
[50] Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 682-688. MIT Press (2001). http://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-kernel-machines
[51] Zaremba, A., Aste, T.: Measures of causality in complex datasets with application to financial data. Entropy 16(4), 2309 (2014) · doi:10.3390/e16042309
[52] Zaremba, W., Gretton, A., Blaschko, M.: B-test: a non-parametric, low variance kernel two-sample test. In: Advances in Neural Information Processing Systems (2013) · Zbl 0798.62055
[53] Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. In: Uncertainty in Artificial Intelligence, pp. 804-813 (2011) · Zbl 0037.20701
[54] Zhao, J., Meng, D.: FastMMD: ensemble of circular discrepancy for efficient two-sample test. Neural Comput. 27(6), 1345-1372 (2015) · Zbl 1476.62082 · doi:10.1162/NECO_a_00732
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.