Abstract
This paper presents a new processor array architecture for scalable radix2 Montgomery modular multiplication algorithm. In this architecture, the multiplicand and the modulus words are allocated to each processing element rather than pipelined between the processing elements as in the previous architecture extracted by C. Koc, and also the multiplier bits are fed serially to the first processing element of the processor array every odd clock cycle. By analyzing this architecture, we found that it has a better performance - in terms of area and speed - than the previous architecture extracted by C. Koc.