This paper presents an architecture assembled with computer vision algorithms to perform automatic people counting passing through a portal. The system is divided in four stages: capture, detection, tracking and counting. First, the top view of the portal is obtained with an RGB camera and analyzed using a state-of-the-art deep neural network for object detection. The tracking algorithm is activated when a person is detected and automatically identifies the movement patterns. Then, the rules for entering and leaving the portal are defined in the image area to perform the counting. The architecture is tested using videos collected from public buses. Result metrics show an average precision of 96.8% and a recall of 92.0%. In addition, a comparison with a previous counting method for this scenario is presented. With the proposed approach, an F1-Score of 94.3% is obtained, in contrast with a 89.4% presented by the compared method, proving an increase in the performance of the system. Finally, the complete system is mounted in the gate of a building.