characteristic that just happens as it grows.
At the planning stage, we need to set out the reliability objectives to be achieved and state how their achievement will be measured. This involves not only setting the end objective, but also considering how we expect reliability to gradually improve over time.
A commonly used time-based measure for reliability is the mean time between failures (MTBF), which is made up of the following two components:
1) The mean time to failure (MTTF), representing the actual time elapsed (in hours) between observed failures
2) The mean time to repair (MTTR), representing the number of hours needed by a developer to fix the problem
As per the software testing expert Ilene Burnstein, we should be precise in our measurements and that CPU execution time is often a more appropriate measure than simple elapsed "wall clock" time. This enables planned downtimes and other disturbances to be taken into account and removes the possibility of calculating overly pessimistic values of reliability.
Ilene Burnstein describes, a measure for reliability (R), which is based on MTBF and takes a value between 0 (totally unreliable) and 1 (completely reliable). The calculation of R is simply MTBF divided by (1 + MTBF). Clearly, the larger the value of MTBF (i.e., failures occur further apart), the closer R approaches (but, significantly, never reaches) 1.
If recoverability tests are included in our approach to reliability testing, it may be appropriate to define software testing objectives as under:
We need many test repetitions to measure reliability levels. Tests to measure reliability levels are mostly conducted during the system test or (operational) acceptance test levels. This is primarily because these test levels present more opportunity for executing the test cycle repetitions necessary to measure reliability levels accurately. The repetitious nature of these reliability tests also makes them good candidates for conducting dynamic analysis in parallel, especially regarding memory leaks.
Tests aimed at measuring reliability levels can also be conducted in a highly controlled manner with a large number of test cases. If this approach is taken, it may be necessary to plan for a number of days for their execution and possibly the exclusive use of a software testing environment with a stable software configuration over that time frame.
It may be efficient to schedule tests of fault tolerance (robustness) at the same time as failover tests or even certain security tests since the required test inputs (e.g., exception conditions raised by the operating system) may be common.
The operational acceptance test (OAT) level is typically where procedural tests for backup and restoration are conducted. These tests are best scheduled together with the staff that will be responsible for actually performing the specified procedures in production.
Finally, the scheduling of any reliability tests (but in particular, failover tests) for a system of systems can present a technical and managerial challenge that should not be underestimated, especially if one or more components are outside of our direct control.