Approaches to Code Complexity Testing-Cyclomatic Complexity
Two questions coming to mind while doing the code complexity testing are:
Question – 1: Which of the paths are independent?
If two paths are not independent, then we may be able to minimize the number of tests.
Question – 2: Is there any limit on the number of tests that must be run to ensure that all the statements have been executed at least once?
The answer to the above questions is a metric that quantifies the complexity of a program and is known as Cyclomatic Complexity.
It is also known as structural complexity because it gives the internal view of the code. It is a number, which provides us with an upper bound for the number of tests that must be conducted
to ensure that all statements have been executed at least once.McCabe IO covers about 146 different counts and measures. These metrices are grouped according to six main “collections” each of which provides a different level of granularity and information about the code being analyzed.
The collections of various metrices are as under
1) McCabe Metrics: are based on Cyclomatic Complexity, V(G).
2) Execution Coverage metrics: are based on any of Branch, Path or Boolean coverage.
3) Code Grammar metrics: are based around line counts and code structure counts such as Nesting.
4) 00 metrics: are based on the work of Chidamber and Kemerer.
5) Derived Metrics: are based on abstract concepts such as understability, maintainability, comprehension and testability.
6) Custom Metrics: are imported from third party software / systems, e.g. defects count.
McCabe IO provides for about 100 individual metrics at the Method, Procedure, Function, Control and Section / paragraph level. 40 additional metrices are available at the class / file and program level.
Categories of Metrics: There are three categories of metrics
1) McCabe Metrics.
2) 00 Metrics.
3) Grammar Metrics.
When collecting metrics, we rely upon subordinates who need to ‘buy into’ the metrics program. Hence, it is important to only collect what you intend to use. We should remember, ‘The Hawthorne Effect’ which states that when you collect metrics on people, the people being measured will change their behavior. Either of these practices will destroy the efficiency of any metrics program.
Let us discuss the above mentioned three metrics categories
1) McCabe Metrics:
a) Cyclomatic Complexity, V(G) : It is the measure of the amount of logic in a code module of 3rd and 4th generation languages. If V(G) is excessively high then it leads to impenetrable code i.e., a code which is at higher risk due to difficulty in testing. The threshold value is 10. When V(G) > 10; then the likelihood of code being unreliable is much higher. It must be remembered that a high V(G) shows a decreased quality in the code resulting in higher defect that become costly to fix.
b) Essential Complexity: It is a measure of the degree to which a code module contains unstructured constructs. If the essential complexity is excessively high, it leads to impenetrable code i.e. a code, which is at higher risk due to difficulty in testing. Furthermore, the higher value will lead to increased cost due to the need to refactor or worse, reengineer the code. The threshold value is 4. When the essential complexity is more than 4 than the likelihood of the code being un-maintainable is much higher. Remember that a high essential complexity indicates increased maintenance costs with decreased code quality.
Some organizations have used the Essential Density metric (EDM) and it is defined as
EDM = (Essential complexity) / (Cyclomatic complexity)
c) Integration Complexity: It is a measure of the interaction between the modules of code within a program. Say, SO, S1 are two derivatives of this complexity.
Where SO – Provides an overall measure of size and complexity of a program’s design. It will not reflect the internal calculations of each module. It is the sum of all integration complexity in a program.
And S1= (SO – Number of methods + 1).
This is primarily used to determine the number of tests for the ‘some test’ that is designed to ensure that the application would execute without issues in module interaction.
d) Cyclomatic Density (CD): It is a measure of the amount of logic in the code.
It is expressed as follows
CD – Decisions made Lines of executable code
By eliminating the size factor, this metric reduces complexity strictly to modules that have unusually dense decision logic. Remember that the higher is the CD value, the denser is the logic.
The CD metric should be in the range of 0.14 to 0.42 for the code to be simple and comprehensible.
e) Pathological Complexity: It represents an extremely unstructured code which shows a poor design and hence a suggestion for code’s reengineering. A value greater than one indicates poor coding practices like branching into a loop or into a decision. Please note that these conditions are not easy to replicate with modern post 3GL languages.
f) Branch Coverage: It is a measure of how many branches or decisions in a module have been executed during testing.
If the Branch coverage is < 95% for new code or 75% for code under maintenance then the test scripts require review and enhancement.
g) Basis Path Coverage: A measure of how many of the basis (Cyclomatic, V(G)) paths in a module have been executed. Path coverage is the most fundamental of McCabe design. It indicates how much logic in the code is covered or not covered. This technique requires more through testing than Branch Coverage.
If the path coverage is < 90% for new code or 70% for code under maintenance then the test scripts require review and enhancement.
h) Boolean Coverage: A technique used to establish that each condition within a decision is shown by execution to independently and correctly affect the outcome of the decision.
The major application of this technique is in safety critical systems and projects.
i) Combining McCabe Metrics: Cyclomatic complexity is the basic indicator for determining the complexity of logic in a unit of code. It can be combined with other metrics as well. For example,
1) Code Review Candidate
If V(G) > 10 and Essential complexity/Essential Density exceeds 4, then the unit needs a review.
2) Code refactoring:
If V(G) > 10 and the condition
V(G) – EV(g) 5 V(g) is true
Then, the code is a candidate for refactoring.
3) Inadequate Comment Content:
If the graph between V(G) against comment % (in terms of LOC) does not show a linear increase then the comment content need to be reviewed.
4) Test Coverage
If the graph between V(G) against Path coverage does not show a linear increase then the test scripts need to be reviewed.
2) 00 Metrics:
a) Average V(G) for a Class: If average V(G) > 10 then this metric indicates a high level of logic in the methods of the class which in turn indicates a possible dilution of the original object model. If the average is high, then the class should be reviewed for possible Refactoring.
b) Average Essential Complexity for a Class: If the average is greater than one then it may indicate a dilution of the original object model. If the average is high, then the class should be reviewed for possible refactoring.
c) Number of Parents: If the number of parents for a class is greater than one then it indicates a potentially overly complex inheritance tree.
d) Response for Class (RFC): RFC is the count of all methods within a class plus the number of methods accessible to an object of this class due to implementation. Please note that the larger the number of methods that can be invoked in response to a message, the greater is the difficulty in comprehension and testing of the class. Also note that low values indicate greater specialization. If RFC is high then making changes to this class will be increasingly difficult due to the extended impact to other classes (or methods).
e) Weighted Methods for Class (WMC): WMC is the count of methods implemented in a class. It is a strong recommendation that WMC does not exceed the value of 14. This metric is used to show the effort required to re-write or modify the class. The aim is to keep this metric low.
f) Coupling Between Objects (CBO): It indicates the number of non-inherited classes, this class depends on. It shows the degree to which this class can be re-used.
For Dynamic Link Libraries (DLLs) this measure is high as the software is deployed as a complete entity.
For executables (EXE), it is low as here reuse is to be encouraged.
Remember that strong coupling increases the difficulty in comprehending and testing a class. Our objective must be to keep it less then 6.
g) Class Hierarchy Level: It shows the degree of inheritance used. If it is greater than 6, the increased depth increases the testing effort. If it is less than 2 then the value shows a poor exploitation of 00. So, one should aim for 2 and 3 levels only in 00-code.
h) Number of Methods (n): If number of methods (n) > 40 then it shows that the class has too much of functionality. So, it can be split into several smaller classes. Please note that ideally one should aim for no more than 20 methods in a class.
i) Lack of Cohesion between Methods (LOCM): It is a metric used to measure the dissimilarity of methods on a class by an instance variable or attribute.
Finally what needs to be done ?
The percentages of methods in a class using an attribute are averaged and subtracted from 100. This measure is expressed in %.
Here following two situations arise
1) If % is low, it means simplicity and high reusability.
2) If % is high, it means a class is a candidate for refactoring and could be split into two or more subclasses with low cohesion.
Combined 00 Metrics: V(G) can also be used to evaluate 00 systems. It is used with 00 metrics to find out the suitable candidates for refactoring.
Refactoring means making a small change to the code which improves its design without changing its semantics.
Rules for Refactoring:
Rule – 1: If avg. V(G) > 10 (high) and the number of methods (n) is < 10 (low) then the class requires refactoring.
Rule – 2: If avg. V(G) is low and the lack of cohesion is high then the class is a suitable candidate for refactoring into two or more classes.
Rule – 3: If avg. V(G) is high and CBO is high then also the class is a candidate for refactoring.
Rule – 4: If CBO is high and lack of cohesion is high then the class is a candidate for refactoring.
3) Grammar Metrics
a) Line Count: It is a size indicator. It is used in
# Estimation techniques like COCOM02.
# Measuring defects = (Number of defects) / (1000 LOC)
b) Nesting Levels: Nesting of IF statements, switch and loop constructs can indicate unnecessarily complex conditions, which makes future modifications quite difficult. So, refactoring maybe done. Typical industry standards are 4, 2 and 2 for IF, switch and loop constructs respectively.
c) Counts of Decision Types: It is used to show single outcome (IF and loop) and multiple outcome decision statements. When used in conjunction with V(G), then its value can determine if a method / procedure / control / section is over complex and hence a suitable candidate for refactoring.
d) Maximum Number of Predicates: This measure shows an overly complex decision statements, which are candidates for refactoring.
e) Comment Lines: It indicates the level of comments in a unit of code. It shows
Documentation level (within the code) = (Comment lines) / LOC
= (Comment lines) / V(G)
Historically, a ratio of 15-25% of comments is adequate to enable any user to understand the code.
Many more Articles & Tutorials on White Box Testing

An expert on R&D, Online Training and Publishing. He is M.Tech. (Honours) and is a part of the STG team since inception.