ImpactU Versión 3.11.2 Última actualización: Interfaz de Usuario: 16/10/2025 Base de Datos: 29/08/2025 Hecho en Colombia
FAST VARIABLE SELECTION FOR MASS SPECTROMETRY ELECTRONIC NOSE APPLICATIONS RAPIDA SELECCION DE VARIABLES PARA APLICACIONES DE NARICES ELECTRONICAS BASADAS EN ESPECTROMETRIA DE MASAS
High dimensionality is inherent to MS -based electronic nose applications where hundreds of variables per measurement (m/z fragments) ─ a significant number of them being highly correlated or noisy ─ are available. Feature selection is, therefore, an unavoidable pre-processing step if robust and parsimonious pattern classification models are to be developed. In this article, a new strategy for feature selection has been introduced and its good performance demonstrated using two MS e-nose databases. The feature selection is conducted in three steps. The first two steps are aimed at removing noisy, non-informative and highly collinear features (i.e., redundant), respectively. These two steps are computationally inexpensive and allow for dramatically reducing the number of variables (near 80% of initi ally available features are eliminated after the second step). The third step makes use of a stochastic variable selection method (simulated annealing) to further reduce the number of variables. For example, applying the method to an Iberian ham database has resulted in the number of features being reduced from 209 down to 14. Using the surviving m/z fragments, a fuzzy ARTMAP classifier was able to sort ham samples according to producer and quality (11-category classification) with a 97.24% success rate. The whole feature selection process runs in a few minutes in a Pentium IV PC platform. Resumen: Una alta dimensionalidad es inherente en aplicaciones de narices electronicas basadas en MS, donde se pueden encontrar cientos de variables por medida, un numero significativo de ellas proporcionan ruido o una alta correlacion entre ellas. En este articulo, una nueva estrategia de seleccion de variables es desarrollada con buenos resultados usando dos bases de datos de narices electronicas basadas en MS. El proceso se realizo en tres pasos. En los dos primeros pasos el objetivo es eliminar ruido e informacion altamente colineal (redundancia), respectivamente. El tercer paso se utiliza el metodo de seleccion estocastico (simulated annealing) para reducir significativamente el numero de variables. El proceso de seleccion total se ejecuto en pocos minutos en una plataforma Pentium IV.