Compressive spectral imaging fusion (CSIF) has recently attracted attention as a popular methodology to improve the spatial and spectral resolution simultaneously. The joint of coded aperture design with the fusion via a deep neural network is state-of-the-art for CSIF. However, the current results are focused on simulation results where implementation complications such as calibration and adjustment processes in the fusion methods are skipped. Therefore, this paper presents an efficient assemble prototype for CSIF. In particular, some implementation details such as pixel mismatch and manufacturing noise are considered during the training to reduce the calibration problems. Furthermore, a re-training of the network using captured ground truth images and the calibrated sensing matrices is presented. Real fusion results of the testbed implementation validated the proposed fusion system.