Cellular reprogramming, understood as the artificial changing of one cell type into another, could potentially be used for the prevention or cure of complex diseases, such as cancer or neurodegenerative disorders. However, the identification of efficient reprogramming targets and strategies with classical wet-lab experiments is hindered by lengthy time commitments and high costs. To address these issues, in-silico methods are considered. In this study, we formulate a novel control problem in the context of cellular reprogramming for the Boolean network (BN) and Probabilistic Boolean Network (PBN) computational models of gene regulatory networks [1]. To solve this problem, we develop pbn-STAC - a novel computational framework based on deep reinforcement learning (DRL) [2] that facilitates the identification of reprogramming strategies for large BN/PBN models. To address the issue of the computational complexity of finding stable states, i.e., configurations which correspond to cell types or cellular phenotypic states, we introduce the notion of a pseudo-attractor and we devise a procedure for the identification of pseudo-attractor states during DRL training. Given source and target (pseudo-)attractor states, pbn-STAC finds proper control targets and strategies that drive the network from the source to the target by intervening only in intermediate attractor states that correspond to phenotypic cellular states observable in wet-lab practice. We evaluate the performance of pbn-STAC on a number of models of various sizes and a biological case study, i.e., a published BN model of immune response against B. bronchiseptica infection. We compare the results obtained with our approach with the exact, optimal solutions wherever possible and we show the effectiveness of PBN-STAC in identifying control targets and strategies. We consider our framework as a contribution towards the ultimate aim of developing scalable control methods for large models of gene regulatory networks.