Reliability Testing and Assessment of Risks due to Poor Reliability

Posted in: ISTQB Advanced CTAL Exam Preparation

Reliability Testing & Assessment of Risks due to Poor Reliability

Reliability tests are designed to confirm whether the software will work in the expected environment for an acceptable amount of time without degradation. It is quite difficult to perform reliability testing effectively and generally it becomes more & more difficult due to the lack of clear requirements.

Everyone expects the software to “Work”, but no one prefers to define what “Work” means.

This is the great challenge ISTQB certified “Technical Test Analysts” encounter while planning and executing the reliability tests.

What is the meaning of reliability?

First of all let us understand the meaning of reliability; reason being, reliability is generally less understood as compared to other quality attributes like functionality, performance, and security.

Reliability describes the ability of the software product to perform its required functions under stated conditions for a specified period of time or for a specified number of operations. Thus while talking about reliability, we consider following two important factors

1) “Doing

what?” (Stated conditions)

2) “For how long?” (Time or operations)Reliability is measured by a specific failure intensity metric, like the mean time between failures (MTBF). Software that fails on average once a week is considered less reliable than software that fails once a month. We need to differentiate between the severity of those failures and the conditions under which the software was operating (the “doing what?” element of the reliability definition).

How can we increase the Software Reliability?

Software reliability can be improved by programming practices that “catch” error conditions as they occur and handle them in a defined manner.

E.g. generate an error message, do some alternative action, use default values if calculated values are found to be incorrect in some way.

This ability of the software to maintain a specified level of performance and not to break when a failure or an unexpected event takes place is called “Fault Tolerance”. We can use the word “Robustness” also for this.

An important aspect of reliability refers to the ability of software to reestablish a specified level of performance and recover any data directly affected by the failure.

The “Recoverability” of software can be considered by following two aspects:

1) Fail-over capability: This refers to the ability to maintain continuous system operations even in the event of failure. In this case, the re-establishing of a specified level of performance may actually take place seamlessly and without getting noticed by the end users.

2) Restore capability: This refers to the ability to minimize the effects of a failure on the system’s data. If the recovery is required to take place as a result of some catastrophic event like fire or earthquake etc. we call it “Disaster Recovery”.

While considering the recoverability aspects of reliability, we need to provide due consideration to the impact of a failure or disruption:

1) The criticality of system failures

2) The consequences of interruptions in normal operations (whether planned or not)

3) The implications of any data losses resulting from failures

What are the activities of Reliability Test Planning?

Test planning focuses on all the reliability attributes & performing following primary activities:

1) Assessment of risks associated with reliability
2) Definition of an appropriate testing approach to address those risks

3) Setting reliability goals

4) Scheduling the tests

What can be the effects of poor reliability on different applications?

Reliability risks can affect different types of system as well as different types of industries.

Few examples of applications where high reliability levels can be expected are being described in the table given below.

Sr.	Type of Application	Consequence of Poor Reliability
1.	Control software for chemical processes that need to run continuously	Exposure to the risk of uncontrolled chemical reactions taking place
2.	Software for military surveillance radar	Risk to a country’s defenses
3.	Online systems with worldwide user bases (e.g., eBay, Amazon)	Considerable financial loss and damage to their corporate images
4.	Check-in software for airlines	Delays to passengers and loss of market share
5.	Service Oriented Architectures (SOA) in which web-based business services offer general services for use by other applications	Loss of functionality for any application using this service

Following are the examples where poor software recoverability can pose significant threats.

Sr.	Type of Application	Consequence of Poor Recoverability
1.	Safety-critical software that must not fail while in operation (e.g., flight control software).	Exposure to safety risk (e.g. aircraft crash).
2.	Business applications that, for example, make use of external systems and must provide at least basic standby functionality despite failure in those external systems. E.g., an online movie ticket reservation system may rely on an external application for credit card validation. If this fails, the system must still be able to accept reservations for later confirmation.	Basic standby functionality cannot be provided. In the example, the online movie ticket reservation system cannot accept unconfirmed reservations and will cause its business owner loss of revenue.
3.	Any application where downtimes must be minimized and could even be regulated by Service Level Agreements. E.g. a system for automatically collecting money from users of a rail network.	System takes too much time to restore to an agreed-upon level of service following failure or planned downtime. In the example, the system may not have recovered after scheduled night time maintenance by the time the rush hour starts. The system owner loses money, the operator may be fined for breach of SLA, and the users pay nothing.
4.	Any application where data backups are considered a necessity. E.g. an application used by sales force may need to regularly back up its customer database.	Data loss as a result of scheduled or unplanned application downtime. In the example, the sales force may actually lose customer data (with a variety of consequences according to what data was lost for which customer).