×

AUGEM

swMATH ID: 17584
Software Authors: Q. Wang, X. Zhang, Y. Zhang, Q. Yi
Description: AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs. Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our template-based approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors.
Homepage: http://dl.acm.org/citation.cfm?id=2503219
Related Software: LAPACK; R; ATLAS; PARDISO; UMFPACK; CSparse; BLAS; BLIS; CUDA; ADOL-C; Eigen; PRIMME; CASTEP; CIRR; ARPACK; PETSc; SLEPc; ABINIT; hypre; Quantum Espresso
Cited in: 11 Publications

Citations by Year