Abstract
Fast mesh compression is becoming a requisite in several applications such as medical imaging and video games. Graphics Processing Units (GPUs) are recently becoming massively parallel devices for Single Instruction, Multiple Data (SIMD) computing, addressing hence greater implementation challenges. Transformation and Quantization (TQ) is considered the second highest workload part of the wavelet-based mesh coding. Therefore, its acceleration will further improve the overall processing speed of the coding. In this paper, an OpenCL (Open Computing Language) acceleration of TQ is proposed. The Butterfly Wavelet Transform (BWT) based on the unlifted scheme is adopted in the transformation method while the embedded deadzone quantization is employed for the wavelet quantization. A chunk rearrangement process is applied for the computation of the neighborhood information needed for the Butterfly subdivision stencils. Accordingly, every chunk proceeds independently the prediction of the wavelet coefficients and their quantization. The key insights behind the proposed TQ method on GPU are a smart memory management and an efficient memory data mapping. Extensive experimental assessments demonstrate the effectiveness of our GPU implementation in terms of memory and runtime costs while preserving the rate distortion performance of the state-of-the-art Bitplane coder.