×

The QC-2 parallel queue processor architecture. (English) Zbl 1243.68092

Summary: Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit synthesizable QueueCore (QC-2) – an improved and optimized version of the produced order parallel queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor.
A prototype implementation is produced by synthesizing the high-level model for a target FPGA device. We present the architecture description and design results in a fair amount of details.

MSC:

68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
68M14 Distributed systems

Software:

MediaBench; MiBench
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] B.A. Abderazek, Dynamic instructions issue algorithm and a queue execution model toward the design of hybrid processor architecture, Ph.D. Thesis, Graduate School of Information Systems, the University of Electro-Communications, 2002.
[2] B.A. Abderazek, S. Shigeta, T. Yoshinaga, M. Sowa, Reduced bit-width instruction set architecture for Q-mode execution in hybrid processor architecture, in: IPSJ, Information Processing Society of Japan, June 2003, pp. 19 – 23.
[3] B.A. Abderazek, M. Arsenji, S. Shigeta, T. Yoshinaga, M. Sowa, Queue processor for novel queue computing paradigm based on produced order scheme, in: HPC2004, International Conference on High Performance Computing, Tokyo, July 2004, pp. 169 – 177.
[4] B.A. Abderazek, S. Kawata, T. Yoshinaga, M. Sowa, Modular Design Structure and High-Level Prototyping for Novel Embedded Processor Core, in: EUC 2005, The 2005 IFIP International Conference on Embedded and Ubiquitous Computing, Nagasaki, Japan, December 6 – 9, 2005, pp. 340 – 349.
[5] Abderazek, B. A.; Yoshinaga, T.; Sowa, M.: High-level modeling and FPGA prototyping of produced order parallel queue processor core, J. supercomputing 38, No. 1, 3-15 (2006)
[6] Alpert, D.; Avnon, D.: Architecture of the pentium microprocessor, Micro IEEE 13, No. 3, 11-21 (1993)
[7] Altera Design Software: \langle http://www.altera.com/\rangle .
[8] F. Arahata, O. Nishii, K. Uchiyama, N. Nakagawa, Functional verification of the superscalar SH-4 microprocessor, in: Compcon97, The Proceedings of the International Conference Compcon97, February 1997, pp. 115 – 120.
[9] B. Bisshop, T. Killiher, M. Irwin, The design of register renaming unit, in: VLSI1999, Proceedings of Great Lakes Symposium on VLSI, 1999.
[10] Booth, A. D.: A signed binary multiplication, Quart. J. Mech. appl. Math. 4, No. 2, 236-240 (1951) · Zbl 0043.12902 · doi:10.1093/qjmam/4.2.236
[11] Cadence Design Systems: \langle http://www.cadence.com/\rangle .
[12] A. Canedo, B.A. Abderazek, M. Sowa, A GCC-based compiler for the queue register processor (QRP-GCC), in: IWMST2006, The 2006 International Workshop on Modern Science and Technology, Wuhan, May 2006, pp. 250 – 255.
[13] A. Canedo, B. Abderazek, M. Sowa, A new code generation algorithm for 2-offset producer order queue computation model, J. Comput. Languages Syst. Structures, 2007, to appear.
[14] G. De Micheli, R. Ernst, W. Wolf, Readings in Hardware/Software Co-design, Morka Kaufmann Publishers, ISBN 1-55860-702-1.
[15] Fernandes, M.; Llosa, J.; Topham, N.: Using queues for register file organization in VLIW, technical report ECS-CSG-29-97, (1997)
[16] L. Goudge, S. Segars, Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications, in: Proceedings of COMPCON96, 1996, pp. 176 – 181.
[17] M. Gowan, L. Biro, D. Jackson, Power considerations in the design of the alpha 21264 microprocessor, in: CAD1998, The 35th Design Automation Conference, June 1998, pp. 726 – 731.
[18] M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, R.B. Brown. MiBench: a free, commercially representative embedded benchmark suite, in: IEEE 4th Annual Workshop on Workload Characterization, 2001, pp. 3 – 14.
[19] Heath, L. S.; Pemmaraju, S. V.; Trenk, A. N.: Stack and queue layouts of directed acyclic graphs: part I, SIAM J. Comput. 23, No. 4, 1510-1539 (1996) · Zbl 0926.68095 · doi:10.1137/S0097539795280287
[20] IEEE Standard for Binary Floating-point Arithmetic, ANSI/IEEE Standard 754, 1985.
[21] IEEE Task P754, A proposed standard for binary floating-point arithmetic, IEEE Comput. 14(12) (1981) 51 – 62.
[22] Kane, G.; Heinrich, J.: MIPS RISC architecture, (1992)
[23] Kim, K.; Kim, H. Y.; Kim, T. G.: Top-down retargetable framework with token-level design for accelerating simulation time of processor architecture, IEICE trans. Fund. electron. Comm. comput. Sci. 86-A, No. 12, 3089-3098 (2003)
[24] K. Kissel. MIPS16: high-density MIPS for the embedded market, Technical Report, Silicon Graphics MIPS Group, 1997.
[25] Lee, M. Potkonjak, W.H. Mangione-Smith. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems, in: 30th Annual International Symposium on Microarchitecture (Micro ’97), p. 330, 1997.
[26] D. Lewis, et al. The Stratix logic and routing architecture, in: FPGA-02, International Conference on FPGA, 2002, pp. 12 – 20.
[27] Maejima, H.; Kinaga, M.; Uchiyama, K.: Design and architecture for low power/high speed RISC microprocesor: superh, IEICE trans. Electron. 80, No. 12, 1539-1549 (1997)
[28] V.A. Patankar, A. Jain, R.E. Bryant, Formal verification of an ARM processor, in: 12th International Conference On VLSI Design, 1999, pp. 282 – 287.
[29] P6 Power Data Slides provided by Intel Corporation to Universities.
[30] B.R. Preiss, V.C. Hamacher, Data flow on queue machine, in: ISCA 1985, 12th International Symposium on Computer Architecture, Boston, August 1985, pp. 342 – 351.
[31] H. Schmit, B. Levine, B. Ylvisaker, Queue machines: hardware compilation in hardware, in: FCCM’02, 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2002, pp. 152 – 161.
[32] A. Sharma, R. Jain, Estimating architectural resources and performance for high-level synthesis applications, in: DAC 1993, The 30th International Conference on Design automation, 1993, pp. 355 – 360.
[33] Sheliga, M.; Sha, E. H.: Hardware/software co-design with the HMS framework, J. VLSI signal process. Syst. 13, No. 1, 37-56 (1996)
[34] Smith, J. E.; Sohi, G.: The microarchitecture of superscalar processors, Proc. IEEE 83, No. 12, 1609-1624 (1995)
[35] Sowa, M.; Abderazek, B. A.; Yoshinaga, T.: Parallel queue processor architecture based on produced order computation model, J. supercomputing 32, No. 3, 217-229 (2005)
[36] SuperH RISC Engine SH-1/Sh-2/Sh-DSP Programming Manual: \langle http://www.renesas.com\rangle .
[37] Takahashi, H.; Abiko, S.; Mizushima, S.: A 100 MIPS high speed and low power digital signal processor, IEICE trans. Electron. 80-C, No. 12, 546-1552 (1997)
[38] V. Tiwari et al., Reducing power in high-performance microprocessors, in: CAD 1998, 35th Design Automation Conference, San Francisco, June 1998, pp. 732 – 737.
[39] Xilinx MicroBlaze \langle http://www.xilinx.com/xlnx/\rangle .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.