Abstract
Matrix-vector multiplication is a computationally intensive and kernel operation used in many image processing applications. This paper presents a preliminary Field Programmable Gate Array (FPGA) design and implementation of dense matrix-vector multiplication for use in an image processing application. The design is optimized for speed which is the main requirement for such applications. The design has been implemented on Virtex-4 FPGA using Xilinx ISE 9.2i and the performance is evaluated by computing the execution time on FPGA. FPGA implementation results demonstrate that it can provide a maximum throughput of 16970 frames per second utilizing only 14% Virtex-4 slices and 57% DSP48 blocks which is quite adequate for most real-time image processing applications.