Unlike authentication systems in which the user wants to be correctly identified, for forensics purposes it is very common that speaker utters incorrectly (i.e. with fake intonation) the message, and therefore the verification task is not correctly performed. Our proposal is to provide a system that overcomes this problem. Taking into account that a suspect will not voluntarily provide audio recordings for training, our solution uses indirect features to train and validate a machine learning model using neural networks (NN). The inputs of our system are not directly obtained from audio recordings, but from a quantitative comparison between a pair of recordings of the same speaker or different speakers, using several messages. Wavelet coherence matrix between each pair of audio recordings was used for training, cross validation and test. Neural Network (NN) is adjusted in terms of hidden neurons, learning rate, and number of iterations. According to the results, our system provides OA about 88.2%, Precision of 84.5%, Recall of 90%, F1 score of 87.2%, and AUC of 93.8%.