×

Visualizing bivariate long-tailed data. (English) Zbl 1264.00033

Summary: Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf-Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf-Mandelbrot-Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.

MSC:

00A66 Mathematics and visual arts
62A09 Graphical methods in statistics
05C90 Applications of graph theory
62P25 Applications of statistics to social sciences
90B15 Stochastic network models in operations research
91D30 Social networks; opinion dynamics
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] E. Artin., The Gamma Function . Holt, Rinehart and Winston, New York, 1964. · Zbl 0144.06802
[2] A.-L. Barabási and R. Albert. The emergence of scaling in random networks., Science , 286:509-512, 1999. · Zbl 1226.05223 · doi:10.1126/science.286.5439.509
[3] J. Bennett and S. Lanning. The Netflix prize. In, Proceedings of KDD Cup and Workshop 2007 , 2007.
[4] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale-free random graph process., Random Struct. Algorithms , 18(3):279-290, 2001. ISSN 1042-9832. · Zbl 0985.05047 · doi:10.1002/rsa.1009
[5] V. Colizza, A. Flammini, M. A. Serrano, and A. Vespignani. Detecting rich-club ordering in complex networks., Nature physics , 2:110-115, 2006.
[6] R. Durrett., Random Graph Dynamics . Cambridge University Press, New York, 2006. · Zbl 1116.05001
[7] J. S. Dyer and A. B. Owen. Correct ordering in the Zipf-Poisson model. Technical report, Stanford University, Statistics, September, 2010. · Zbl 1260.62016
[8] W. Gautschi. Some elementary inequalities relating to the gamma and incomplete gamma function., J. Math. Phys. , 38:77-81, 1959. · Zbl 0094.04104
[9] J.-L. Guillaume and M. Latapy. Bipartite graphs as models of complex networks., Physica A , 371:795-813, 2006. · doi:10.1007/11527954_12
[10] J. M. Kleinberg. Authoritative sources in a hyperlinked environment., Journal of the ACM , 46(5):604-632, 1999. · Zbl 1065.68660 · doi:10.1145/324133.324140
[11] S. Maslov and K. Sneppen. Specificity and stability in topology of protein networks., Science , 296:910-913, 2002.
[12] S. Maslov, K. Sneppen, and A. Zaliznyak. Detection of topological patterns in complex networks: correlation profile of the internet., Physics A , 333:529-540, 2004.
[13] R. B. Nelsen., An Introduction to Copulas . Springer, New York, 2nd edition, 2006. · Zbl 1152.62030
[14] M. E. J. Newman. Mixing patterns in networks., Physical Review E , 67 :026126 1-13, 2003. · doi:10.1103/PhysRevE.67.026126
[15] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks., Physical Review E , 69 :026113 1-15, 2004.
[16] M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Finding and evaluating community structure in networks., Proceedings of the National Academy of Science , 99 :2566-2572, 2002. · Zbl 1114.91362
[17] H. Niederreiter., Random Number Generation and Quasi-Monte Carlo Methods . S.I.A.M., Philadelphia, PA, 1992. · Zbl 0761.65002
[18] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society., Nature , 435:814-818, 2005.
[19] G. R. Shorack and J. A. Wellner., Empirical Processes with Applications to Statistics . Wiley, New York, 1986. · Zbl 1170.62365
[20] Yahoo! Webscope., http://research.yahoo.com/Academic_Relations.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.