Abstract
This paper investigates the problem of minimizing the latency of iterative decoding of low-density parity check codes using the sum-product algorithm on a proposed low-complexity programmable pipelined parallel architecture. We present heuristic techniques for solving the NP-hard combinatorial optimization problems of mapping and scheduling the processing tasks of decoding an arbitrary LDPC code on n parallel pipelined processing units so as to minimize the total number of clock cycles required to complete a single decoding iteration. We compare the quality of result and running time of the proposed techniques to those of simple randomized techniques. We also investigate the effect of using local buffering at the pipelined processing units. In the case of zero local buffering, the proposed mapping and ordering techniques offer an improvement of 75.1% over randomized alternatives for the case of n=16. The proposed mapping technique always leads to an improvement of at least 11% over randomized mapping even if infinite local buffering is used. It is shown that a speedup factor of 1.12n-0.024n/sup 2/ can be achieved using a local buffer size of only 32 words.