×

Successive normalization of rectangular arrays. (English) Zbl 1189.62109

Ann. Stat. 38, No. 3, 1638-1664 (2010); correction ibid. 41, No. 5, 2700-2702 (2013).
Summary: Standard statistical techniques often require transforming data to have mean 0 and standard deviation 1. Typically, this process of “standardization” or “normalization” is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates “outcome” for a specific feature, for example, “gene,” “polymorphic site” or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires both the subjects and the features to be “on the same footing.” Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.

MSC:

62H99 Multivariate analysis
60F15 Strong limit theorems
60G46 Martingales and classical analysis
62H05 Characterization and structure theory for multivariate probability distributions; copulas
65C60 Computational problems in statistics (MSC2010)

References:

[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis , 3rd ed. Wiley, Hoboken, NJ. · Zbl 1039.62044
[2] Ashley, E. A., Ferrara, R., King, J. Y., Vailaya, A., Kuchinsky, A., He, X., Byers, B., Gerckens, U., Oblin, S., Tsalenko, A., Soito, A., Spin, J., Tabibiazar, R., Connolly, A. J., Simpson, J. B., Grube, E. and Quertermous, T. (2006). Network analysis of human in-stent restenosis. Circulation 114 2644-2654.
[3] Doob, J. L. (1940). Regularity properties of certain functions of chance variables. Trans. Amer. Math. Soc. 47 455-486. JSTOR: · Zbl 0023.24101 · doi:10.2307/1989964
[4] Durrett, R. (1995). Probability: Theory and Examples , 2nd ed. Duxbury Press, Belmont, CA. · Zbl 0709.60002
[5] Efron, B. (1969). Student’s t -test under symmetry conditions. J. Amer. Statist. Assoc. 64 1278-1302. JSTOR: · Zbl 0188.50304 · doi:10.2307/2286068
[6] Feller, W. (1971). An Introduction to Probability Theory and Its Applications 2 , 2nd ed. Wiley, New York. · Zbl 0219.60003
[7] Gnedenko, B. V. and Kolmogorov, A. N. (1954). Limit Distributions for Sums of Independent Random Variables . Addison-Wesley, Boston, MA. · Zbl 0056.36001
[8] Muirhead, R. J. (1999). Aspects of Multivariate Statistical Theory . Wiley, New York. · Zbl 0556.62028
[9] Scheffé, H. (1999). The Analysis of Variance . Wiley, New York. · Zbl 0998.62500
[10] Zolotarev, V. M. (1986). One-Dimensional Stable Distributions . Amer. Math. Soc., Providence, RI. · Zbl 0589.60015
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.