The repository contains:
- Java implementation of online cross-project approaches proposed in "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction" (ICSE'20) and "Cross-Project Online Just-In-Time Software Defect Prediction" (TSE'22).
- Opensource datasets used for the experiments and hyper-parameter tuning.
- Sadia Tabassum (sxt901 at student dot bham dot ac dot uk)
- Leandro Minku (L dot L dot Minku at bham dot ac dot uk)
- Danyi Feng (danyi at ouchteam dot com)
- Sadia Tabassum
- MOA 2018.6.0
- JDK and JRE 1.8
- Go to the directory src/cpjitsdpexperiment
- There are 4 experiment files- ExpAIO, ExpFilter, ExpOPAIO and ExpOPFilter for online cpjitsdp approaches (AIO, Filter, OPAIO and OPFilter, respectively).
- Run appropriate experiment file (i.e. cpjitsdpexperiment.ExpAIO.java)
Example command (can be found in the experiment files):
CpjitsdpAIO -l (spdisc.meta.WFL_OO_ORB_Oza -i 15 -s "+ens+" -t "+theta+" -w "+waitingTime+" -p "+paramsORB+") -s (ArffFileStream -f (/"+datasetsArray[dsIdx]+") -c 15) -e (FadingFactorEachClassPerformanceEvaluator -a 0.99) -f 1 -d results/results.csv"
- CpjitsdpAIO: Online CPJITSDP approach to run.
- -i 15 - the position of the unixtimestamp of the commit in the arff
- -s - the ensemble size
- -t - the fading factor used for computing the class sizes
- -w - the waiting time for assuming the commit label is available
- -p - the parameters for the ORB.
- Values for -s,-t,-w and -p can be passed as arguments.
- Default values for -s,-t,-w and -p are (20,0.99,90 and 100;0.4;10;12;1.5;3)
MOA parameters:
- -l the machine learning algorithm to be used.
- -s (ArffFileStream -f -c ) is the path to the dataset in arff format, with -c indicating the index of the class label in the dataset file.
- -e (FadingFactorEachClassPerformanceEvaluator -a ) is the performance evaluator to be used, with -a indicating the fading factor to be adopted.
- -d is the path to the output file where the results of the experiments will be saved.
@attribute fix {False,True}
@attribute ns numeric
@attribute nd numeric
@attribute nf numeric
@attribute entrophy numeric
@attribute la numeric
@attribute ld numeric
@attribute lt numeric
@attribute ndev numeric
@attribute age numeric
@attribute nuc numeric
@attribute exp numeric
@attribute rexp numeric
@attribute sexp numeric
@attribute contains_bug {False,True}
@attribute author_date_unix_timestamp numeric
@attribute project_no numeric
@attribute commit_type numeric
@data
- Attributes[1-14]: Software change metrics.
- Attribute[15]: True label of the commit (whether the commit is really defect-inducing or clean).
- Attribute[16]: Timestamp when the commit was submitted to the repository.
- Attribute[17]: Index number associated to a project in datasetsArray. This index identifies a given project. Note that the index of the target project must be passed as argument dsIdx in the command line of the algorithm. For example, if our target project is Tomcat, then dsIdx should be 0. If the target project was JGroups, dsldx should be 1. datasetsArray contains names of the datasets and needs to be defined in the experiment file (i.e ExpAIO.java). Following datasetsArray is used in this paper:
datasetsArray = {"tomcat","JGroups","spring-integration", "camel","brackets","nova","fabric8", "neutron","npm","BroadleafCommerce" } - Attribute[18]: commit_type is a number assigned based on the following data processing scenario:
For each commit x:
If x is clean:
Add an instance with:
Software change metrics=Attributes[1-14], contains_bug=False, timestamp=[author_date_unix_timestamp],
project_no=relevant project index, commit_type=0
(The online cpjitsdp will use this instance as follows:
If x is from target project:
Test x as clean at timestamp=[author_date_unix_timestamp]
For both target and cross-projects, train x as clean at timestamp=[author_date_unix_timestamp]+[W days (converted into unix_timestamp)])
If x is buggy:
If days_to_first_fix > W:
Add an instance (which will be used for training) with:
Software change metrics=Attributes[1-14], contains_bug=True,
timestamp=[author_date_unix_timestamp]+[days_to_first_fix (converted into unix_timestamp)],
project_no=relevant project index, commit_type=3
If x is from target project:
Add an instance (which will be used for training) with:
Software change metrics=Attributes[1-14], contains_bug=False,
timestamp=[author_date_unix_timestamp]+[W days (converted into unix_timestamp)],
project_no=relevant project index, commit_type=0
Add an instance (which will be used for testing) with:
Software change metrics=Attributes[1-14], contains_bug=True, timestamp=[author_date_unix_timestamp],
project_no=relevant project index, commit_type=1
If x is not from target project:
Add an instance (which will be used for training) with:
Software change metrics=Attributes[1-14], contains_bug=False,
timestamp=[author_date_unix_timestamp]+[W days (converted into unix_timestamp)],
project_no=relevant project index, commit_type=4
If days_to_first_fix <= W:
Add an instance (which will be used for training) with :
Software change metrics=Attributes[1-14], contains_bug=True,
timestamp=[author_date_unix_timestamp]+[days_to_first_fix (converted into unix_timestamp)],
project_no=relevant project index, commit_type=3
If x is from target project:
Add an instance (which will be used for testing) with :
Software change metrics=Attributes[1-14], contains_bug=True, timestamp=[author_date_unix_timestamp],
project_no=relevant project index, commit_type=2
After the processing, processed data needs to be sorted in ascending order of the timestamp to mainitain the chronology.
Note: MOA is provided within this repo under the GPL 3 license. Online CPJITSDP makes use of opensource code for ORB in http://doi.org/10.5281/zenodo.2555695.