Computational systems are nowadays composed of basic computational components which share multiprocessors and coprocessors of different types, typically several GPUs or MICs. The software previously developed and optimized for simpler systems needs to be redesigned and re-optimized for these new, more complex systems. The adaptation to hybrid multicore+multiGPU and multicore+multiMIC of auto-tuning techniques for basic linear algebra routines is analyzed. The matrix-matrix multiplication kernel, which is optimized for different computational system components through guided experimentation, is studied. The basic matrix-matrix multiplication is, in turn, used inside higher level routines, which delegate their efficient execution to the optimization of the lower level routine. Experimental results are satisfactory in different multicore+multiGPU and multicore+multiMIC systems. So, the guided search of execution configurations for satisfactory execution times proves to be a useful tool for heterogeneous systems, where the complexity of the system means a correct use of highly efficient routines and libraries is difficult.