Know the Black-Box Software Testing Techniques to Test the Software Security
As the demand for e-business & e-commerce is growing and more and more other applications are getting Web access, the need for having secure systems too is growing. Thus security-testing experts too are getting smarter in planning attack-driven testing aimed to ultimately plug the security holes before attackers find them. Breach in security could be due to various reasons, ranging from incidental mistakes, over fun, to serious crime.
Before we elaborate different black-box techniques used in security testing, let us try to have an insight into an important testing technique known as fuzz testing.
Fuzzing or Fuzz testing is a typical Black-box software testing methodology. It was developed & taught by Prof. B Miller in the Univ. of Wisconsin way back in 1988.
Fuzzing is a powerful automated software testing technique that encompasses several boundary cases using invalid data as an application input to ensure the absence of exploitable vulnerabilities. It can take the data from files, network protocols, API calls & other sources as well.
Different black-box techniques used in security testing are:
1) Fuzz Testing:
The objective of fuzzing is to pump in enormous amount of data to a system in order to crash it, therefore revealing security & reliability problems in the software, networks or the operating system. The random data inputted to the system is known as “fuzz”.
Once testers are able to locate the problem in the system, a tool can be deployed to indicate the likely causes of the problem. Such tools are known as fuzzer or fuzz tester. Fuzzing continues to be extensively used by both security and by software testing & QA engineers.
2) Load Testing:
The most common attacks faced by quality assurance personnel are “Denial of Service” (DoS) situations. The majority of DoS attacks are based on load. In load testing, the performance limitations of the system are tested with fast repetition of a test case and by running several tests in parallel. All these tests can become fuzz tests. When a fuzz test is repeated very fast, it can discover problems that are missed by slowly executing fuzzing tools.
One such instance is testing for memory leaks or performance problems. If a test case indicates that there could be some problems, the test case can be extracted and loaded into a performance test tool through the record-and-playback functionality most of these tools possess. Another benefit from load testing comes when testing proxy components such as gateways and firewalls.When a load-generation tool is used in parallel with fuzzing tools, the load-testing tool will help measure the change in the load tolerance of the system. Fuzz tests under a load can also result in different test results. All these results are very feasible in a live server, which almost always will be under a normal system load when an attack comes.
3) Stress Testing:
A stress test will change the operational environment for the SUT by restricting access to required resources. Examples of changes include
# Size and speed of available memory;
# Size and speed of available disk;
# The number of processors available and the processing speed of the processors;
# Environmental variables.
Most often stress tests are executed in a test automation framework that will enable us to run the SUT inside a controlled environment such as a sandbox or software development simulation.
4) Security Scanners:
Using security scanners in software development is common, but this is mostly because of a misunderstanding of a customer requirement. If a customer specifically stresses a requirement, the developer is bound to use the vulnerability scanner tool like Nessus, against their product. In such a case it becomes a part of the software development practice itself.
5) Unit Testing:
Unit testing is the first place to introduce fuzzing by testing the smallest components of the software through the available interfaces.
In unit testing, the SUT is a module used inside the actual application. The different portions of functionality can be implemented in parts in prototypes & the actual logic of the application can thus be bypassed. This can be there when actual implementation is not available yet or when our target happens to be a codec or a file parser. For instance, when testing HTML parsers, we do not necessarily want to run the tests against the full web browser, but we can use the same HTML parsing API calls through a test driver. In such a setup, we can easily achieve amazingly high speed of the fuzzing process.
6) Fault Injection:
Input fault injection is one of the related technologies to fuzzing which is almost a synonym.
The term fault injection refers to a hardware testing technique in which artificial faults are introduced into printed circuit boards. For instance, the connections might be short-circuited, broken, grounded, or stuck to a predefined value such as “0” or “1.” The printed board is then used and the resulting behavior is observed. The purpose is to test the fault tolerance ability or sensitivity of the hardware to faults emerging during manufacturing or product lifetime.
Fault injection can be used to forecast the behavior of hardware during operations or to guide efforts on making the hardware more robust against flaws.
7) Syntax Testing:
Syntax testing technique is used to test formal interfaces such as protocols and extend those techniques to fuzzing.
Systems having interface with public need to be robust and must have undergone due validation checks against different inputs. There are malicious users in every population – infuriating people who delight in doing strange things to our systems. Few hours of attack by one of them is worse than years of ordinary use and bugs found by chance.
The purpose of syntax testing is to verify that the system does some form of input validation on the critical interfaces. Every communication interface presents an opportunity for malicious use, but also for data corruption. Good software developers will build systems that will accept or tolerate any data whether it is non-conformant to the interface specification or just garbage. Good software testing engineers, on the other hand, will subject the systems to the most creative garbage possible. Syntax testing is not random, but instead it will automate the smart fuzzing process by describing the operation, structure, and semantics of an interface. The inputs, whether they are internal or external, can be described with context-free languages like Backus-Naur Form (BNF).
The strategy in syntax test design is to add one anomaly (or error) at a time while keeping all other components of the input structure or message correct. With a complex interface, this alone typically creates tens of thousands of dirty tests. When double errors and triple errors are added, the amount of test cases increases exponentially.
Different types of errors that can be produced in syntax testing are:
# Syntax related errors:Syntax related errors are gross violation of grammar of the coding language. Syntax related errors could exist on different levels in the grammar hierarchy: top-level, intermediate-level, and field-level. Simplest field-level Syntax related errors consist of arbitrary data and random values. Intermediary and top-level Syntax related errors are omitting required elements, repeating, reordering, and nesting any elements or element substructures.
# Delimiter errors:Delimiters mark the separation of fields in a sentence. In ASCII-coded languages the fields are normally characters and letters, and delimiters are white-space characters (space, tab, line-feed, etc.), other delimiter characters (commas, semicolons, etc.), or their combinations. Delimiters can be omitted, repeated, multiplied, or replaced by some unusually typical characters. Paired type delimiters like braces can remain unbalanced. Wrong unexpected delimiters can be added at places where they might not be expected.
# Field-value errors:A field-value error is an illegal field in a sentence. Field-value errors can test for boundary-value errors with both numeric and non-numeric elements. Values exactly at the boundary range or near the boundary range should also be checked. Field errors can include values that are one-below, one-above and absolutely out of the range. Tests for fields with integer values should include boundary values. Use of powers of two plus minus one as boundary values is encouraged since such a binary system is the typical native presentation of integers in computers.
# Error dependent on the context:These type of errors are in violation of some particular property of some sentence which is generally not described in the grammar that is actually context-free.
# State dependency error:Not all sentences are acceptable in every possible state of a software component. A state dependency error is, for instance, a correct sentence during an incorrect state.
8) Negative Testing:
Negative testing comes in many forms. The most common type of negative testing is defining negative tests as use cases – for instance, if a feature implements an authentication functionality, a positive test would consist of trying the valid user name and valid password. Everything else is negative testing, including wrong user name, wrong password, someone else’s password, and so on. Instead of explaining the various forms of manual tactics for negative testing, we will focus on explaining the automated means of conducting negative testing.
The purpose of robustness testing is only to try negative tests and not to care about the responses from the SUT at all. Robustness testing is a model-based negative testing approach that generates test cases or test sequences based on a machine-understandable description of a use case (the model) or a template. The model consists of protocol building blocks such as messages and sequences of messages, with various dynamic operations implemented with intelligent tags. Each message consists of a set of protocol fields, elements that have a defined syntax (form) and semantics (meaning) defined in a protocol specification.
Robustness testing is an automated means of conducting negative testing using syntax testing techniques.
The greatest difference from fuzzing is that robustness testing almost never has any randomness involved. The tests are created by systematically applying a known set of destructive or anomalous data into the model. The resulting tests are often built into a test tool consisting of a test driver, test data, test documentation, and necessary interfaces to the test bed, such as monitoring tools and test controllers. The robustness tests can also be released as a test suite, consisting of binary test cases, or their descriptions for use with other test automation frameworks. Pre-built robustness tests are always repeatable and can be automated in a fashion in which human involvement is minimal.
9) Regression Testing:
Testing does not end with the release of the software. Corrections and updates are required after the software has been launched, and all new versions and patches need to be verified so that they do not introduce new flaws, or reintroduce old ones. Post-release testing is also known as regression testing. Regression testing needs to be very automated and fast. The tests also need to be very stable and configurable. A minor update to the communication interface can end up invalidating all regression tests if the tests are very difficult to modify.
Regression testing can be related to following two different laws that apply to software testing:
1) Every testing method we deploy in software development, or every test case we implement into our regression testing, will leave a remainder of petty bugs against which such tests are simply ineffective. We need to be prepared to always integrate new techniques and tests into our processes.
2) Software complexity (and therefore the complexity of bugs) grows to the limits of our ability to manage that complexity. By eliminating “easy” bugs, we will allow the complexity of the software to increase to a level where the more finer bugs become so large in numbers & gain much significance.
The more we test the software, the more immune it becomes to our test cases. The remedy is to continually write new and different tests to exercise different parts of the software. Whenever a new flaw is found, it is important to analyze that individual bug and see if there is a more systematic approach to catching that and similar mistakes. A common misunderstanding in integrating fuzzing related regression flaws is to incorporate one single test into the regression test database, when a more robust solution would be to integrate a suite of test cases to prevent variants of that flaw.
Hence regression tests should avoid any fixed, non-deterministic or magic values. A bad security-related example would be regression testing for a buffer overflow with one fixed length. A flaw that was initially triggered with a string of 200 characters might later re-emerge as a variant that is triggered with 201 characters. Modification of the tests should also not result in missed bugs in the most recent release of the software. Regression tests should be constantly updated to catch newly found issues.
Flaws in the regression database give a good overview of past mistakes, and it is very valuable information for developers and other testers. The regression database should be constantly reviewed and analyzed from the learning perspective. A bug database can reveal valuable information about critical flaws and their potential security consequences. This, in itself, is a metric of the quality of various products.
Advantages of Fuzzing:|
1) Fuzzing is one of the highly powerful testing methodologies by which we can discover many security issues in our software product.
2) It is far superior to many tools meant for code auditing where one can find good number of code related problems, however a scientific comparison of findings with smartly planned fuzzing, and a full fledged code audit, one can certainly conclude in favor of fuzzing.
3) Majorities of results on using tools meant for code auditing are falsely reported positives, posing virtually no security threats. Fuzzing does not indicate problems like falsely reported positives. A crash is truly detected as a crash. A bug is truly detected as a bug.
4) Every problem detected by fuzzing can be exploited remotely, of course depending upon the fuzzing of the interface.
5) Fuzzing is quite helpful in doing the analysis of proprietary systems, off-the-shelf available software & closed-source applications; reason being in majority of the cases, source code is not required to be accessed.
Many More articles on Test Automation Frameworks
An expert on R&D, Online Training and Publishing. He is M.Tech. (Honours) and is a part of the STG team since inception.