Regression Test Suite Study Using Classic Statistical Methods and Machine Learning

. Regression Test Suite Study Using Classic Statistical Methods and Machine learning.


Introduction
This work assumes few things.Further the work tries to keep things as simple as possible.The idea of keeping things simple is with a reason.The final goal is to give workable solution for projects of Industrial scale.Some readers may find few of the details debatable.Few things assumed are: • More time tester spends into a given project, their effectiveness or rather the test cases devel-oped by the given tester improves.
• While theoretically complex solutions tend to be appealing, their application to the projects of Industrial scale is far and few.
• The efficiency of the hand crafted test cases is related to time spent by tester in a given project.
• Machine Learning where used, the number of attributes or features are kept minimal to avoid running into dimensional issues and to retain the effectiveness of the solution proposed.
The book by Author Abhinandan H. Patil lays foundation on studying the relation between effec-tiveness of the test suite and number of product lines of code traced while having least number of lines in the test case and having least test execution time [1].For the benefit of readers, the where: LOCTesti is Lines of Code in the test case number "i" LOCProdi is Product Lines of code traced by the test case "i" Ti is the test case execution time of test case "i" Nmi = (LOCProdi) / ( LOCTesti X Ti) In plain English the test case which traces maximum product lines of code with least number of lines in itself in least time is termed efficient.
The Author Abhinandan H. Patil gives hint that this can be starting point for many things incluing application of classical statistical techniques and Machine Learning on this type of data [1].The work here starts where the book left work as an exercise for the readers.All the tables used in this work will consist of rows where each row is the per test case information.Each column is the attribute or feature of the test case.

Historical Data to be Maintained between Successive Execution of the Test Cases
The work assumes that where possible the test team maintains the following information: • Total months spent by the test case developer in testing the product.
• LOCTesti, LOCProdi, Ti where each one of them retain the meaning explained in section 1.
• Number of bugs uncovered by the test case.

Where:
Texp is testers experience in months in given projects LOCTesti, LOCProdi, Ti, Nmi retain the usual meaning Wbug is appropriate weight assigned for the bugs uncovered We introduce Nm2i which is simply new second metrics.Let us define the Nm2i as:

Classical Statistical Analysis
Classical Statistical analysis in expected situation would reveal peculiar pattern.The following rela-tions should be straight forward regression lines: 1) Relation between Texp and Nmi

2) Relation between Texp and Bugs uncovered 3) Relation between Texp and Nmi2
But this is very intuitive and expected behavior of the test suite.This is the precursor or preamble to the work that we will be building upon in the subsequent sections.Rather than employing these classical methods we will be exploring the Machine Learning techniques to be explained in the sub-sequent sections.

Clustering of Data with Machine Learning
In Clustering the required data is plotted on a two dimensional graph where each point is instance.Associating unique number with the instances is clustering.By assigning the unique number with the instances we cluster the data.
Some clustering Algorithms lead to each instance associating with one and only one class.This is exclusive clustering.
Some clustering Algorithms lead to instances associating with more than one class.This will lead to venn diagrams with overlapping clusters.
Other Algorithms lead to hierarchical clusters called Dendrograms.Dendrograms are essentially tree structures.
Unlike in classical approach, we will be employing latest Machine Learning Algorithm to cluster the data in this section.Let us revisit the Table 2.We have all the required information to train the computational machines at our disposal using the Machine Learning Algorithms.Since we do not have the associated class information with each test case, we leave it to computational machine for decoding.We pass 5 attributes excluding the derived attribute of Table 2 to the Machine Learning Algorithms.The output of the Algorithm will be Clustered data.These clusters are "Buckets" in standard terminology of testers.A rush through the buckets/clusters will tell which clusters/buckets are priority.The clusters could be ranked.Further these rankings will help in test selection, prioriti-zation, pruning and Regression test execution time reduction using the prioritization.
The standard Algorithms at our disposal are: 1) K Means Clustering 2) X Means Clustering 3) EM Classification 4) Cobweb Clustering 5) Hierarchical Clustering We will not be getting into details of how these Algorithms work or how they are implemented.These Algorithms come as prepackaged in library or tool that we will be using.

Weka as Machine Learning Tool
We will be using Weka as a Machine Learning Tool for our study however any other tool could be used.
The Authors prefer to use Weka for its simplicity, flexibility and capabilities.
Weka is a full-fledged tool developed using Java.The owner of the tool is University of Waikato, New Zealand.The tool comes with exceptionally well documented manual to aid the users.Weka could be invoked using: 1) Command line in turn using the shell scripts.
3) Using external code with Weka as library.
Weka could be extended for the specific purpose as the tool is open source.However, users seldom run into this situation.As mentioned earlier the tool is full-fledged and evolving.
6. Data Supplied to the Weka @RELATION testsuite @ATTRIBUTE testexp REAL @ATTRIBUTE loctesti REAL @ATTRIBUTE locprodi REAL @ATTRIBUTE ti REAL @ATTRIBUTE weightforbug REAL
n Effectiveness of the total test suite = Σ Nmi Effectiveness of the test suite per test case = (Σ Nmi)/n Table 1: Data Maintained for Nmi.

Table 2 : Data Maintained between Successive Execution.
[2][3][4] tester who has spent more time with the product tends to be well versed with the system and testing he/ she will produce efficiently crafted test cases.Hypothetical data supplied to the Weka tries to emulate the real test suite as far as possible.The supplied data has taken into consideration few facts about the relation between testers experience and effectiveness of the test case {LOCTesti, LOCProdi, Ti, BugsUncoveredWeight}. We see that test cases could be clustered in the number of buckets re-quired.Further cluster0 tends to be the best cluster.Followed by the cluster 1 and 2. cluster3 con-tains the test cases crafted by novice tester.This is expected as the tester is still at the starting point of learning curve.Most of the test cases crafted by tester with 8 months experience are in cluster0.Test cases crafted by tester with 16 months is mostly in cluster2.Test cases crafted by tester with 24 months are shared between cluster1 and cluster2.While labelling the test cases on the basis of few attribute such as experience of the tester in the product testing can be controversial from few readers perspective, it is unavoidable as the result could be analyzed from various perspectives.Management could use the data from Return on Investment (ROI) angle.These are unavoidable situation when the test result is analyzed from various perspective.This work consciously wants to avoid such controversy.11.ConclusionWhile few things are unavoidable, Data Science and Machine Learning has its own advantages.The positive side of this work is that the work could be used for test selection, test suite prioritization, pruning and Regression test suite execution time reduction among many other things[2][3][4].
Copyright: ©2024 Abhinandan H. Patil, et al.This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.https://opastpublishers.com/