zbMATH — the first resource for mathematics

Implementation and evaluation of active storage in modern parallel file systems. (English) Zbl 1209.68077
Summary: Active Storage is a technology aimed at reducing the bandwidth requirements of current supercomputing systems, and leveraging the processing power of the storage nodes used by some modern file systems. To achieve both objectives, Active Storage moves certain processing tasks to the storage nodes, near the data they manage. Our proposal for Active Storage has several key features: user-space implementation which facilitates the port to different file systems, analytical model to anticipate the performance of Active Storage with respect to a traditional system, support for striped files and complex-format files such as netCDF, and scientific-friendly programming and run-time environment.
68M99 Computer system organization
Full Text: DOI
[1] S. Ghemawat, H. Gobioff, S.-T. Leung, The Google File System, in: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03), 2003, pp. 29 – 43.
[2] IBM Corp., General Parallel File System, <http://www.almaden.ibm.com/StorageSystems/projects/gpfs>, 2008.
[3] Panasas, The Panasas ActiveScale File System (PanFS), <http://www.panasas.com/panfs.html>, 2008.
[4] SGI, InfiniteStorage Shared Filesystem CXFS, <http://www.sgi.com/pdfs/2816.pdf>, 2008.
[5] Cluster File Systems Inc., Lustre: A Scalable, High-Performance File System, <http://www.lustre.org>, 2002.
[6] P.H. Carns, W.B. Ligon III, R.B. Ross, R. Thakur, PVFS: a parallel file system for linux clusters, in: Proceedings of 4th Annual Linux Showcase and Conference, 2000.
[7] Red Hat Inc., Global File System, <http://www.redhat.com/gfs>, 2008.
[8] E.J. Felix, K. Fox, K. Regimbal, J. Nieplocha, Active storage processing in a parallel file system, in: Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, 2006.
[9] J. Piernas, J. Nieplocha, E.J. Felix, Evaluation of active storage strategies for the lustre parallel file system, in: Proceedings of the 2007 Supercomputing Conference (SC07), 2007.
[10] J. Piernas, J. Nieplocha, Efficient management of complex striped files in active storage, in: Proceedings of the 2008 Euro-Par Conference, 2008, pp. 676 – 685.
[11] A. Acharya, M. Uysal, J. Saltz, Active disks: programming model, algorithms and evaluation, in: Proceedings of the ACM ASPLOS Conference, 1998, pp. 81 – 91.
[12] S.C. Chiu, W. Keng Liao, A.N. Choudhary, Design and evaluation of distributed smart disk architecture for I/O-intensive workloads, in: Lecture Notes in Computer Science 60 (Proceedings of the ICCS’03 Conference), 2003, pp. 230 – 241.
[13] Keeton, K.; Patterson, D. A.; Hellerstein, J. M.: A case for intelligent disks (IDISKs), SIGMOD record 24, No. 7, 42-52 (1998)
[14] E. Riedel, G. Gibson, C. Faloutsos, Active storage for large-scale data mining and multimedia, in: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), 1998, pp. 62 – 73.
[15] Rew, R. K.; Davis, G. P.: Netcdf: an interface for scientific data access, IEEE computer graphics and applications 10, No. 4, 76-82 (1990)
[16] The HDF Group, HDF5 Home Page, <http://www.hdfgroup.org/HDF5/index.html>, 2008.
[17] D.J. DeWitt, P. Hawthorn, A performance evaluation of database machine architectures, in: Proceedings of the International Conference on Very Large Data Bases (VLDB), 1981, pp. 199 – 214.
[18] L. Huston, R. Sukthankar, R. Wickremesinghe, M. Satyanarayanan, G.R. Ganger, E. Riedel, A. Ailamaki, Diamond: a storage architecture for early discard in interactive search, in: Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04), 2004, pp. 73 – 86.
[19] H. Lim, V. Kapoor, C. Wighe, D.H. Du, Active disk file system: a distributed, scalable file system, in: Proceedings of the 18th IEEE Symposium on Mass Storage Systems and Technologies, San Diego, 2001, pp. 101 – 115.
[20] S.C. Chiu, W. Keng Liao, A.N. Choudhary, Processor-embedded distributed MEMS-based storage systems for high-performance I/O, in: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04), vol. 91b, 2004. · Zbl 1078.68561
[21] Ma, X.; Reddy, A. N.: MVSS: an active storage architecture, IEEE transactions on parallel and distributed systems 14, No. 10, 993-1005 (2003)
[22] R. Wickremesinghe, J.S. Chase, J.S. Vitter, Distributed computing with load-managed active storage, in: Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC-11), 2002.
[23] G.A. Gibson, D.F. Nagle, K. Amiri, F.W. Chang, E.M. Feinberg, H. Gobioff, C. Lee, B. Ozceri, E. Riedel, D. Rochberg, J. Zelenka, File server scaling with network-attached secure disks, in: Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 1997, pp. 272 – 284.
[24] Mesnier, M.; Ganger, G.; Riedel, E.: Object-based storage, IEEE communications magazine 41, No. 8, 84-90 (2005)
[25] IBM, Storage Tank, <http://www.almaden.ibm.com/StorageSystems/projects/storagetank>, 2008.
[26] EMC, Centera Family, <http://www.emc.com/products/family/emc-centera-family.htm>, 2008.
[27] M. Abd-El-Malek, W.V.C. II, C. Cranor, G.R. Ganger, J. Hendricks, A.J. Klosterman, M. Mesnier, M. Prasad, O. Salmon, R.R. Sambasivan, S. Sinnamohideen, J.D. Strunk, E. Thereska, M. Wachs, J.J. Wylie, Ursa minor: versatile cluster-based storage, in: Proceedings of the Fourth USENIX Conference on File and Storage Technologies (FAST’05), 2005, pp. 59 – 72.
[28] R.A. Oldfield, A.B. Maccabe, S. Arunagiri, T. Kordenbrock, R. Riesen, L. Ward, P. Widener, Lightweight I/O for Scientific Applications, Technical Report SAND2006-3057, Sandia National Laboratories, 2006.
[29] S.A. Weil, S.A. Brandt, E.L. Miller, D.D. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in: Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI’06), 2006, pp. 307 – 320.
[30] INCITS Technical Committee T10, SCSI Object-Based Storage Device Commands (OSD), Working Draft, revision 10, <http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>, 2004.
[31] E. Riedel, Object Based Storage (OSD) Architecture and Systems, <http://www.snia.org/education/tutorials/2007/spring/storage/Object-based_Storage-OSD.pdf>, 2007.
[32] Seagate, The Advantages of Object-Based Storage – Secure, Scalable, Dynamic Storage Devices, <http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf>, 2005.
[33] S.W. Schlosser, S. Iren, Database storage management with object-based storage devices, in: Proceedings of the First International Workshop on Data Management on New Hardware (DaMoN), 2005.
[34] D.H. Du, Intelligent storage for information retrieval, in: Proceedings of the International Conference on Next Generation Web Services Practices (NWeSP’05), 2005, pp. 214 – 220.
[35] G.S. Davidson, K.W. Boyack, R.A. Zacharski, S.C. Helmreich, J.R. Cowie, Data-Centric Computing with the Netezza Architecture, Technical Report SAND2006-3640, Sandia National Laboratories, 2006.
[36] P.J. Braam, R. Brightwell, P. Schwan, Portals and networking for the lustre file system, in: Proceedings of IEEE International Conference on Cluster Computing, 2002.
[37] J. Piernas, J. Nieplocha, Active Storage User’s Manual, <http://hpc.pnl.gov/projects/active-storage>, 2008b.
[38] D. Geels, G. Altekar, S. Shenker, I. Stoica, Replay debugging for distributed applications, in: Proceedings of the 2006 USENIX Annual Technical Conference, 2006, pp. 289 – 300.
[39] S. Mitra, R.R. Sinha , M. Winslett, An efficient, nonintrusive, log-based I/O mechanism for scientific simulations on clusters, in: Proceedings of IEEE International Conference on Cluster Computing, 2005.
[40] W. Yu, J. Vetter, R.S. Canon, S. Jiang, Exploiting lustre file joining for effective collective IO, in: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID ’07), 2007, pp. 267 – 274.
[41] netCDF Operator (NCO), <http://nco.sourceforge.net>, 2008.
[42] PVFS, <http://www.pvfs.org>, 2008.
[43] Schuchardt, K.; Palmer, B.; Daily, J.; Elsethagen, T.; Koontz, A.: IO strategies and data services for petascale data sets from a global cloud resolving model, Journal of physics: conference series 78 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.