This paper focuses on acoustic event recognition, a significant research area in various fields, including security, medicine, and entertainment. The study explores the design space of combining convolutional neural networks with Mel Frequency Cepstral Coefficients to extract audio features. This work explicitly investigates three acoustic events relevant to citizen security: gunshots, screams, and sirens. We aim to find the optimal combination of hyperparameters to train accurate models with low computational requirements. The proposed approach achieved impressive F1-Scores of 95% for sirens, 97.2% for gunshots, and 99% for screams. Furthermore, considering computational complexity, our results demonstrate that these sounds could be utilized in real-time acoustic event recognition systems for citizen security applications.