Abstract
On-line division is one of the slowest operations among the basic arithmetic operations and naturally becomes a bottleneck in networks of on-line modules that use it. A higher radix divider has a good potential to attain higher throughput than radix-2 dividers and therefore improve the overall throughput of networks where division is needed. The improvement in throughput when using radix 4 is not straightforward since several components of the divider become more complex than in the radix-2 case. Previously proposed radix-4 designs were based on operand pre-scaling to simplify the selection function and reduce the critical path delay, at the cost of more complexity in the algorithm conditions and operations, plus a variable on-line delay, which is a very unattractive feature when small precision values are used (usually the case for DSP). These designs include several phases for pre-scaling and actual division. This paper proposes a design approach based on overlapped replication that results in a radix-4 on-line division module with low algorithm complexity, single division phase, less restrictions to the input values, and a small and fixed on-line delay.