Metrics to judge the Quality & Reliability of our Testing
During testing the software under test is executed with a set of test cases. As the quality of delivered software depends substantially on the quality of testing, following two fundamental questions arise in the minds of “Technical Test Analysts” while testing:
Q 1: How good is the testing that has been done?
Q 2: What is the quality or reliability of software after testing is completed?
During testing, the primary purpose of metrics is to try to answer these and other related questions.
Three important metrics or the areas of interest for ISTQB certified experts like “Technical Test Analysts” are as under.
1) Coverage Analysis: One of the most commonly used approaches used by the “Technical Test Analysts” for evaluating the thoroughness of testing is to
use the coverage measures. We know that some of the common coverage measures that are used in practice are
a) Statement coverage
b) Branch coverage.
To use these coverage measures for evaluating the quality of testing, proper coverage analysis tools will have to be employed which can inform not only the coverage achieved during testing but also which portions are not yet covered.Often, organizations build guidelines for the level of coverage that must be achieved during testing. Generally, the coverage requirement will be higher for unit testing, but lower for system testing as it is much more difficult to ensure execution of identified blocks when the entire system is being executed. Often the coverage requirement at unit level can be 90% to 100% (We need to keep in mind that 100% may not be always possible, as there may be unreachable code).
Besides the coverage of program constructs, coverage of requirements is also often examined. It is for facilitating this evaluation that in test case specification the requirement or condition being tested is mentioned. This coverage is generally established by evaluating the set of test cases to ensure that sufficient number of test cases with suitable data are included for all the requirements. The coverage measure here is the percentage of requirements or their clauses – conditions for which at least one test case exists. Often a full coverage may be required at requirement level before testing is considered as acceptable.
2) Reliability: After testing is done and the software is delivered, the development is considered over. It will clearly be desirable to know, in quantifiable terms, the reliability of the software being delivered. As reliability of software depends considerably on the quality of testing, by assessing reliability we can also judge the quality of testing. Alternatively, reliability estimation can be used to decide whether enough testing has been done. In other words, besides characterizing an important quality property of the product being delivered, reliability estimation has a direct role in project management – the “Test Analyst” or the “Project Manager” can decide whether enough testing has been done and when to stop the testing.
Reliability of a product specifies the probability of failure-free operation of that product for a given time duration. Most reliability models require that the occurrence of failure be a random phenomenon. In software even though failures occur due to preexisting bugs, this assumption will generally hold for larger systems, but may not hold for small programs that have bugs (in which case one might be able to predict the failures). Hence, reliability modeling is more meaningful for larger systems.
Let X be the random variable that represents the life of a system. Reliability of a system is the probability that the system has not failed by time t.
In other words, R(t) = P(X > t).
The reliability of a system can also be specified as the mean time to failure (MTTF). MTTF represents the expected lifetime of the system.
Reliability can also be defined in terms of failure intensity which is the failure rate (i.e., number of failures per unit time) of the software at time t. From the measurement perspective, during testing, measuring failure rate is the easiest, if defects are being logged. A simple way to do this is to compute the number of failures every week or every day during the last stages of testing. And the number of defects logged can approximate number of failures. (Though failures and defects are different, in the last stages of testing it is assumed that defects that cause failures are fixed soon enough and therefore do not cause multiple failures.) Generally, this failure rate increases in the start of testing as more and more defects are found, peaks somewhere in the middle of testing, and then continues to drop as fewer defects are reported. For a given test suite, if all defects are fixed, then there should be almost no failures toward the end. And that could be considered as proper time for release of this software. That is, a release criterion could be that the failure rate at release time is zero failures in some time duration, or zero failures while executing a test suite.
Though failure rate tracking gives a rough sense of reliability in terms of failures per day or per week, for more accurate reliability estimation, better models have to be used. Software reliability modeling is a complex task, requiring rigorous models and sophisticated statistical analysis. Many models have been proposed for software reliability assessment, and a survey of many of the models is given in.
As failure of software also depends critically on the environment in which it is executing, failure rates experienced in testing will reflect the ultimate reliability experienced by the user after software release only if testing closely mimics the user behavior. This may not be the case, particularly with lower levels of testing. However, often at higher levels, active effort is made to have the final test suite mimic the actual usage. If this is the case, then reliability estimation can be applied with a higher confidence.
3) Defect Removal Efficiency: Another analysis of interest is defect removal efficiency, though this can only be determined sometime after the software has been released. The purpose of this analysis is to evaluate the effectiveness of the testing process being employed not the quality of testing for a project. This analysis is useful for improving the testing process in the future.
Usually, after the software has been released to the client, the client will find defects, which have to be fixed (generally by the original developer, as this is often part of the contract). This defect data is also generally logged. Within a few months, most of the defects would be uncovered by the client (often the “warranty” period is 3 to 6 months).
Once the total number of defects (or a close approximation to the total) is known, the defect removal efficiency (DRE) of testing can be computed. The defect removal efficiency of a quality control activity is defined as the percentage reduction in the number of defects by executing that activity. As an example, suppose the total number of defects logged is 500, out of which 20 were found after delivery, and 200 were found during the system testing. The defect removal efficiency of system testing is 200/220 (just about 90%), as the total number of defects present in the system when testing started was 220. The defect removal efficiency of the overall quality process is 480/500, which is 96%. Incidentally, this level of DRE is decent and is what many commercial organizations achieve.
It should be clear that DRE is a general concept, which can be applied to any defect removal activity. For example, we can compute the DRE of design review, or unit testing. This can be done if for each defect, besides logging when and where the defect is found, the phase in which the defect was introduced is also analyzed and logged. With this information, when all the defects are logged, the DRE of the main quality control tasks can be determined. This information is extremely useful in improving the overall quality process.