This paper describes a deep learning approach for urban land cover classification in the context of the ISPRS 2D semantic labelling benchmark. A high spatial resolution digital swface model (DSM) and a true ortho-image over the city Potsdam (Germany) was used as input dataset for obtaining six target classes. The proposed approach focuses on augmenting the original input dataset with a combined set of geo-morphometric variables extracted from DSM -including slope/aspect transformation, second derivate of elevation, compound topographic index and hierarchical slope position-. Furthermore, it uses advanced deep learning architecture provided by H2O framework which follows the model of multi-layer, feedforward neural networks for predictive modelling. Automatic hyperparameter tuning with random search was conducted for model selection. The method comprises five steps: (i) spectral segmentation of ortho-irnages; (ii) extraction of relevant geo-morphometric variables from DSM; (iii) multivariate land cover classification; and (iv) accuracy assessment. The proposed approach was used for classifying a selected ISPRS benchmark tile where a reference map is available. Thematic accuracy of the proposed approach was assessed using the traditional error matrix and compared with thematic accuracy of a deep learning classification based only on the original data set (i.e. DSM and multispectral imagery). In addition, the deep learning classification approach was compared with a random forest (RF) classification using both original and augmented input dataset. It is shown that: (i) thematic accuracy improves only slightly when geomorphological variables are used to enhance the input dataset; and (ii) deep neural nets provide a similar predictive power than random forests for urban remote sensing applications.