Abstract
This paper presents a new processor array architecture for scalable radix 8 Montgomery modular multiplication algorithm. In this architecture, the multiplicand and the modulus words are allocated to each processing element rather than pipelined between the processing elements as in the previous architectures extracted by G. Todorov. Moreover, the multiplier bits are fed serially to the first processing element of the processor array every odd clock cycle. By analyzing this architecture, we found that it has a better performance - in terms of area, speed, and power consumption-than the previous radix 8 architecture extracted by G. Todorov.