Empirical Evaluation of Machine Learning Classifiers for Static Metric Software Defect Prediction

Authors

  • Ishrat Afreen, Narendra Parmar, Vishal Shrivastava, Gagan Sharma

Keywords:

Generalization, Confusion Matrix, Classification Rule Mining, NASA MDP Repository, Machine Learning Classifiers, Software Defect Prediction

Abstract

With the exponential scaling and structural dependency of complex code deployment, maintaining high software quality parameters has surfaced as a primary bottleneck within the contemporary software development lifecycle. Late-stage bug debugging and software defects noticeably amplify overall implementation costs, consume vast development schedules, and exhaust institutional engineering hours. Consequently, early fault localization and predictive defect classification act as key economic levers, enabling optimal programmatic planning and allocation of limited validation resources. This paper provides a result-oriented comparative mapping of data mining schemes optimized specifically for empirical Software Defect Prediction (SDP) using supervised classification rules. We comprehensively detail the theoretical profiles of multiple paradigms: Bayesian modelling, rule-based induction engines, logistic formulations, and decision tree topologies. Utilizing statistical hold-out validation across twelve high-dimensional legacy systems extracted from the public NASA Metric Data Program (MDP) repository, we baseline the performance of eight prominent learners—Naive Bayes (NB), Logistic Regression (LOG), Decision Table (DT), OneR, PART, JRip, J48, and J48Graft. Evaluation across standard predictive dimensions (Accuracy, Sensitivity, Specificity, Balance, and Receiver Operating Characteristic (ROC) Area) proves that model predictive dominance varies considerably by system architectural traits. The findings confirm that optimizing a defect framework demands a modular, context-driven selection strategy rather than a static monolithic algorithm.

References

H. Erdogmus, "Data mining static code attributes to learn defect predictors," IEEE Software, vol. 24, no. 1, pp. 12–14, 2007.

J. Gondra, "Applying machine learning to software fault-proneness prediction," Journal of Systems and Software, vol. 81, no. 2, pp. 186–195, 2008.

B. Yang, K. J. Schon, and X. Liu, "Software defect prediction based on classification rule mining," Information and Software Technology, vol. 53, no. 4, pp. 331–343, 2011.

S. Bibi, G. Tsoumakas, I. Stamelos, and I. Vlahavas, "Regression via classification models for software defect estimation," Methodologies and Intelligent Systems for Technology, vol. 22, no. 3, pp. 201–211, 2006.

T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2–13, 2007.

A. D. Oral and A. B. Bener, "Defect prediction for embedded software systems," Computer and Information Sciences, vol. 22,

pp. 1–6, 2007.

Y. Chen, X. Shen, P. Du, and B. Ge, "Research on software defect prediction based on data mining architectures," Proceedings of the International Conference on Computer Engineering, vol. 1, pp. 563–567, 2010.

M. Shepperd, Q. Song, Z. Sun, and C. Mair, "Data quality: Some comments on the NASA software defect data sets," IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208–1215, 2013.

R. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking classification models for software defect prediction: A proposed framework and novel findings," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485–496, 2008.

Q. Song, M. Jia, M. Ying, and H. Liu, "A general software defect-proneness prediction framework," IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 356–370, 2011.

G. J. Pai and J. B. Dugan, "Empirical analysis of software fault content and fault proneness using Bayesian methods," IEEE Transactions on Software Engineering, Vol 33 no. 10 – 2008

M. Hall, E. Frank, G. Holmes, "The WEKA data mining software: An update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009

D. M. W. Powers, "Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011

Downloads

How to Cite

Ishrat Afreen, Narendra Parmar, Vishal Shrivastava, Gagan Sharma. (2026). Empirical Evaluation of Machine Learning Classifiers for Static Metric Software Defect Prediction. International Journal of Research & Technology, 14(2), 1626–1632. Retrieved from https://ijrt.org/j/article/view/1503

Issue

Section

Original Research Articles