×

Mining blackhole and volcano patterns in directed graphs: a general approach. (English) Zbl 1260.91217

Summary: Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.

MSC:

91D30 Social networks; opinion dynamics
05C90 Applications of graph theory
91-04 Software, source code, etc. for problems pertaining to game theory, economics, and finance
05C85 Graph algorithms (graph-theoretic aspects)
05-04 Software, source code, etc. for problems pertaining to combinatorics
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Adamic L, Brunetti C, Harris J, Kirilenko A (2010) Trading networks. SSRN eLibrary. http://ssrn.com/paper=1361184
[2] Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: Spotting anomalies in weighted graphs. In: Proceedings of the 14th pacific-Asia conference on knowledge discovery and data mining (PAKDD’10), Hyderabad, pp 410–421
[3] Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York · Zbl 0801.62001
[4] Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (ACM SIGMOD’00), Providence, pp 93–104
[5] Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD’04), Pisa, pp 112–124
[6] Chaudhary A, Szalay AS, Moore AW (2002) Very fast outlier detection in large multidimensional data sets. In: Proceedings of ACM SIGMOD workshop on research issues in data mining and knowledge discovery, Dalas
[7] Cook DJ, Holder LB (1994) Substructure discovery using minimum description length and background knowledge. J Artif Intel Res (JAIR) 1: 231–255
[8] Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. The MIT Press, Cambridge
[9] Diestel R (2006) Graph theory (Graduate texts in mathematics). Springer, Heidelberg
[10] Gehrke J, Ginsparg P, Kleinberg JM (2003) Overview of the 2003 KDD Cup. In: ACM SIGKDD Explorations 5(2):149–151
[11] Ghosh R, Lerman K (2008) Community detection using a measure of global influence. In: The 2nd SNA-KDD workshop on social network mining and analysis (SNA-KDD’08), Las Vegas
[12] Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826 · Zbl 1032.91716
[13] Hawkins D (1980) Identification of outliers. Chapman and Hall, Dordrecht · Zbl 0438.62022
[14] Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’03), Washington
[15] Huan J, Wang W, Prins J (2003) Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (IEEE ICDM’03), Melbourne
[16] Jiang X, Xiong H, Wang C, Tan AH (2009) Mining globally distributed frequent subgraphs in a single labeled graph. Data Knowl Eng 68: 1034–1058 · Zbl 05841541
[17] Johnson RA, Wichern DW (1998) Applied multivariate statistical analysis. Prentice Hall, New York
[18] Knuth D (2011) The art of computer programming, Vol 4A: combinatorial algorithms. Addison-Wesley, Boston · Zbl 1237.00033
[19] Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Mining Knowl Discov 11(3): 243–271
[20] Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’05), Chicago, pp 157–166
[21] Leskovec J, Faloutsos C (2006) Sampling from Large Graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’06), Philadelphia, pp 631–636
[22] Leskovec J, Huttenlocher D, Kleinberg J (2010a) Predicting Positive and Negative Links in Online Social Networks. In: Proceedings of the 19th international world wide web conference (WWW’10), Raleigh
[23] Leskovec J, Huttenlocher D, Kleinberg J (2010b) Signed Networks in Social Media. In: Proceedings of the 28th ACM conference on human factors in computing systems (CHI’10), Atlanta
[24] Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’05), Chicago
[25] Leskovec J, Lang K, Dasgupta A, Mahoney M (2008) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. In: arXiv.org:0810.1355 · Zbl 1205.91144
[26] Li Z, Xiong H, Liu Y, Zhou A (2010) Detecting Blackhole and Volcano Patterns in Directed Networks. In: Proceedings of the 10th IEEE International Conference on Data Mining (IEEE ICDM’10), Australia, pp 294–303
[27] Mehlhorn K, Naher S (1999) The LEDA platform of combinatorial and geometric computing. Cambridge University Press, Cambridge
[28] Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intel Tools 17(1):19–36
[29] Newman MEJ (2004) Detecting community structure in networks. Eur Phys J B 38: 321–330
[30] Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69: 026113
[31] Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’03), Washington, pp 631–636
[32] Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE’03), Bangalore, pp 315–326
[33] Pathak N, DeLong C, Banerjee A, Erickson K (2008) Social topic models for community extraction. In: The 2nd SNA-KDD Workshop on Social Network Mining and Analysis (SNA-KDD’08), Las Vegas
[34] Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T (2004) Probabilistic author-topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’04), Magdeburg
[35] Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graph. In: Proceedings of the 5th IEEE international conference on data mining (IEEE ICDM’05), Houston, pp 418–425
[36] Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston
[37] Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’04), Magdeburg
[38] Wang J, Hsu W, Lee M, Sheng C (2006) A partition-based approach to graph mining. In: Proceedings of the 22nd international conference on data engineering (ICDE’06), Atlanta, p 74
[39] Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE international conference on data mining (IEEE ICDM’02), Maebashi
[40] Zhou D, Manavoglu E, Li J, Giles CL, Zha H (2006) Probabilistic models for discovering e-communities. In: Proceedings of the 15th international world wide web conference (WWW’06), Edinburgh
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.