Our goal is to be able to align objects in an RGB‐D image with 3D models from a library. Our pipeline for this task involves detecting and segmenting objects and estimating coarse pose using a convolutional neural network, followed by inserting the rendered model in the scene.