The purpose of this study is to determine if the advantage of the deep learned features over the hand-crafted ones, that is evidenced in the state of the art, is still maintained for actions that are carried out in a similar environment, for real applications. The comparison is performed using a dataset created specifically for the study, in which the actions that are carried out are very similar and with a common and noisy environment. The study shows that for a database with a limited number of videos and common environment it is better to consider the hand-crafted features than a shallow CNN architecture as feature extractor.