Author: Yongfeng Qiu, Yuxiao Li, Xin Liang, Yafan Huang, Guanpeng Li, Sheng Di, Franck Cappello, Hanqi Guo

Abstract
We present a novel use of error-bounded lossy compression to accelerate distributed parallel volume rendering, which requires blending many semi-transparent rendered images from distributed processes, known as parallel image compositing. Specifically, we significantly improve the widely adopted binary-swap algorithm by compressing intermediate images while strictly bounding the pixel-wise error by the user-given tolerance. To bound the output error, we propose a fine-granular error bound model for every round of communications in binary swaps. The error bounds are derived based on the visibility of each intermediate pixel, characterized by occlusions from other processes and its opacity. As a result, we introduce a two-round process, first losslessly computing occlusions and then blending lossy compressed colors distributively. Our algorithm also adaptively decides whether lossy compression reduces communication time for different processes and communication rounds in binary swaps. We evaluate our algorithm with an end-to-end GPU parallel volume rendering pipeline that uses a CUDA-accelerated renderer and compressor with CUDA-aware MPI with up to 256 GPUs on the Perlmutter supercomputers