This paper proposes a method for image indexing that allows to retrieve related images under the query-by-example paradigm. The proposed strategy exploits multimodal interactions between text annotations and visual contents in the image database to build a semantic index. We achieve this using a Non-negative Matrix Factorization algorithm to construct a latent semantic space in which visual features and text terms are represented together. The proposed system was evaluated using a standard benchmark dataset. The experimental evaluation shows a significant improvement on the system performance using the proposed multimodal indexing approach.