×

zbMATH — the first resource for mathematics

Continuous monitoring of distributed data streams over a time-based sliding window. (English) Zbl 1236.68011
Summary: In this paper we extend the study of algorithms for monitoring distributed data streams from whole data streams to a time-based sliding window. The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is \(O(\frac{k}{\varepsilon} \log\frac{\varepsilon N}{k})\) bits for basic counting, \(O(\frac{k}{\varepsilon} \log\frac{N}{k})\) words for frequent items and \(O(\frac{k}{\varepsilon^{2}} \log\frac{N}{k})\) words for quantiles, where \(k\) is the number of distributed data streams, \(N\) is the total number of items in the streams that arrive or expire in the window, and \(\varepsilon <1\) is the given error bound. The performance of our algorithms matches and nearly matches the corresponding lower bounds. We also show how to generalize these results to streams with out-of-order data.

MSC:
68M10 Network design and communication in computer systems
68M14 Distributed systems
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aggarwal, C.: Data Streams: Models and Algorithms. Springer, Berlin (2006) · Zbl 1126.68033
[2] Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999) · Zbl 0938.68153
[3] Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the 36th International Colloquium on Automata, Languages and Programming, pp. 95–106 (2009) · Zbl 1248.68085
[4] Arasu, A., Manku, G.: Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 286–296 (2004)
[5] Babai, L., Gal, A., Kimmel, P., Lokam, S.: Communication complexity of simultaneous messages. SIAM J. Comput. 33(1), 137–166 (2004) · Zbl 1069.68051
[6] Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 633–634 (2002) · Zbl 1093.68571
[7] Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data, pp. 28–39 (2003)
[8] Busch, C., Tirthapua, S.: A deterministic algorithm for summarizing asynchronous streams over a sliding window. In: Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science, pp. 265–476 (2007) · Zbl 1186.68014
[9] Cormode, G., Garofalakis, M.: Sketching streams through the net: distributed approximate query tracking. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 13–24 (2005)
[10] Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, pp. 25–36 (2005)
[11] Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 89–98 (2008)
[12] Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1076–1085 (2008) · Zbl 1192.68080
[13] Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the 29th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 77–86 (2010) · Zbl 1281.68094
[14] Das, A., Ganguly, S., Garofalakis, M., Rastogi, R.: Distributed set-expression cardinality estimation. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 312–323 (2004)
[15] Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002) · Zbl 1008.68039
[16] Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Proceedings of the 10th Annual European Symposium on Algorithms, pp. 323–334 (2002) · Zbl 1019.68533
[17] Demaine, E., Lopez-Ortiz, A., Munro, J.: Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th Annual European Symposium on Algorithms, pp. 348–360 (2002) · Zbl 1019.68502
[18] Gibbons, P., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 63–72 (2002) · Zbl 1093.68143
[19] Greenwald, M., Khanna, S.: Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 275–285 (2004)
[20] Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 471–475 (2001) · Zbl 1323.68567
[21] Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 148–155 (2000)
[22] Jain, N., Yalagandula, P., Dahlin, M., Zhang, Y.: Insight: A distributed monitoring system for tracking continuous queries. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles, pp. 1–7 (2005)
[23] Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2006)
[24] Lee, L.K., Ting, H.F.: Maintaining significant stream statistics over sliding windows. In: 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 724–732 (2006) · Zbl 1192.68182
[25] Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 290–297 (2006)
[26] Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 767–778 (2005)
[27] Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 635–646 (2006)
[28] Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers, Hanover (2005) · Zbl 1128.68025
[29] Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)
[30] Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Transactions on Database Systems 32(4) (2007)
[31] Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. In: Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 167–174 (2009)
[32] Yi, K., Zhang, Q.: Private communication
[33] Yi, K., Zhang, Q.: Multi-dimensional online tracking. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1098–1107 (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.