Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant software can be easily retrieved by developers from large repositories. State of the art approaches either rely on the availability of the source code, or use supervised machine learning approaches, which require a set of already labeled software as training data. These restrictions make current approaches fail when such information is not available. We propose a novel approach, which overcomes these limitations by using semantic information recovered from byte code and an unsupervised algorithm to assign categories to software systems. We evaluated our approach in a study on the Apache Foundation Repository of Java libraries and the results indicate that our approach is able to correctly identify a correct category for 86% of the libraries.
Tópico:
Web Data Mining and Analysis
Citaciones:
6
Citaciones por año:
Altmétricas:
0
Información de la Fuente:
FuenteInternational Conference on Program Comprehension