Abstract
In this paper, H e present a domain-programmable (code-independent) parallel architecture for efficiently implementing iterative probabilistic decoding of LDPC codes. The architecture is based on distributed computing and message passing. The exploited parallelism was found to be communication limited. To increase the utilization of the computational resources we separate the routing process and state management functionalities performed physical nodes from computation functionalities performed by function units that can be shared by multiple physical nodes. Simulation results show that the proposed architecture leads to improvements in FU utilization bp 251%, 116%, and 209% compared to a hypothetical fully parallel custom implementation, a fully sequential implementation. and a proprietary FPGA custom implementation, respectively: that all use the same core FU design. Compared to an implementation on a shared-memory general-purpose parallel machine, the proposed architecture exhibits 75.6% improvement in scalability. In this paper,we also introduce a new low cost store-and-forward routing algorithm for deadlock avoidance in torus networks.