Mining GitHub for Novel Change Metrics to Predict Buggy Files in Software Systems

Code change metrics mined from source control repositories have proven to be the most reliable predictors of bugs in contemporary software engineering research. Yet a definitive modus operandi for obtaining the required data from a particular software configuration management (SCM) repository needs to be put forward. In this paper, we define a modus operandi to extract some popular change metrics from the Eclipse repository on Github, which can be generalized for any open source Github repository.

We define few code change metrics that are intuitively significant for predicting bugs. Bug prediction models built with these metrics along with the existing prominent code change metrics prove to be competent and consistent as per our experiments on five different versions of Eclipse JDT project. We explored Naïve Bayes Tree algorithm to build a prediction model and have found it to perform better than other commonly used algorithms in this problem domain.

Share this post