# Data flow testing

In control flow coverage, we were concerned, about the flow of control within the code. In data flow coverage, we are concerned about the flow of data within the code. From data, we mean the variables, references or objects, which are created, used and destroyed during the execution of the program. There are many logical errors, which remain hidden if we skip the data flow coverage. For example, look at the following code snippet.

Int x;

If(x<1000)

print(“X is less than 1000”);

What logical mistake is there in this code? If you are a programmer, you would know that the variable x was used but never defined. The correct sequence for any variable or object is

- Definition
- Use
- Killing

## Definition

We can conceive variables as data buckets. The variable is defined when we put something in the bucket by replacing the initial content. For instance, when write “ int x; “ the variable is declared but not defied as there is some junk value in it. When we write x =100, it is definition as now the variable will contain the value 100 replacing any previous garbage value.

## Use

The use of a variable means when we make use of its contents. The use may be categorized in two types.

### Predicate Use

When we use the variable in some predicate logic like comparison operators, it is called predicate use or p use in short. For example, in below code, we see the p use of x

If( x === 0)

print(“zero”);

The predicate use returns false or true depending upon the value.

### Computation Use

When we use the variable in some sort of mathematical computation, it is called computation use or c use in short.

## Killing

The variable is killed when it frees up all the allocated memory. We declare variables in two ways.

- On Stack Memory
- On Heap Memory

### On Stack

The variables defined on stack are usually automatically killed as soon as the control leaves the scope of the variable. We do not need to delete those variables consciously.

### On Heap

The variables defined on heap are little complicated ones. They need to be destroyed manually from heap. If we keep on defining the variables on heap and do not destroy them, the memory leakage will occur. This memory leakage will slow down the system and the only solution to free up the unnecessarily occupied space is to restart the computer system. When we define an object on heap, the object variable behaves like pointer and is itself defined at stack. However, the memory to where it is pointing resides at heap. When the control goes out of the scope the object variable. The Object variable is destroyed but the memory it was pointing to, is not destroyed. For example, consider the following piece of code

{

{

Int ptr* = new int[10];

Delete ptr; // accessible here

}

Delete ptr; // not accessible here

}

In this code the pointer variable ptr itself is on stack. Therefore, it is automatically destroyed when control leaves the scope. The memory location where it is pointing is not automatically destroyed. To destroy that memory location we need to write the statement delete ptr within the scope of the pointer variable. If we forget to write it there, the memory leakage will occur and we would never be able to reclaim that memory until we reboot the system.

### Normal or Valid Data Flow

~d defined and it was the first occurrence of the variable

du Defined followed by used

kd killed followed by defined

ud used followed by defined

uk used followed by killed

uu used followed by used

k~ killed and it was the last occurrence of the variable.

u~ used and it was the last occurrence of the variable.

### Abnormal or Invalid Data Flow

~u used and it was the first occurrence of the variable. It is a bug as we should not use a variable before defining it otherwise we would be using its garbage value.

~k killed and it was its first occurrence of the variable. It is also a bug as what is the use of killing a variable without defining and using it.

dd defined followed by defined. It is also a bug as defining variable again will replace the previous value. We should use a variable before defining it again otherwise; it is a wastage of time and space.

dk defined followed by killed. It is a bug as the variable was defined and then killed without being used anywhere in the code. Again, it is the wastage of time and space.

kk killed followed killed. It is like overkilling. Mind it that the variables are also like humans, once killed, they are gone forever. Killing twice will waste the energy only.

ku killed followed by used. It is something impossible that how we can use a variable that has been destroyed. Therefore, it is also a bug.

d~ defined and it is the last occurrence of the variable. It is a bug as there is no use of defining a variable when it is not being used anywhere in the code.

## How to write test cases for data flow testing.

There are different approaches to write test cases for data flow testing. In this tutorial we will discuss

- Using control flow graph.
- Using data driven paths

### Using control flow graph.

In this approach, we follow these steps.

- Draw control flow graph for the given piece of code.
- Calculate cycolmatic complexity from the control flow graph and draw basis paths according to the cyclomatic complexity.
- Now, for each basis path, see how many variables are involved.
- For every variable in the given basis path, trace the defined, used and killed sequence in the valid order and write test case for every variable in the given basis path.
- Repeat the process for all the basis paths.

Say there are 5 basis paths and there are three variables in every path, you will write three test cases for every basis paths and there would be 5 x 3 =15 test cases in total.

### Example

In this control flow graph, there are two predicate nodes or binary decision points. Therefore, the complexity of the control flow graph would be predicate nodes + 1 = 2 +1 = 3.

For basis paths, we take an arbitrary path through the graph. Say, we have the following first arbitrary path.

Path 1 = 1, 2, 4, 5, 7

In next path, we would flip the branch of the first decision point and the remaining path would remain the same as the first arbitrary path.

Path 2 = 1, 3, 5, 7

In next path, we would flip the branch of the second decision point and the remaining path would remain same as the first arbitrary path.

Path 3 = 1, 2, 4, 5, 6, 7

We traced out the three basis paths as per cyclomatic complexity. Now in every path, we would trace the define, use and kill flow for every variable (x, y, z). Therefore, we may have to write 3 x 3 =9 test cases. We may write more or less test cases as well. the idea is to trace every viable in every basis path for all the define, use and kill sequences.

## Data Flow Testing Using Data Driven Paths

In this technique, we trace out a path using the data variables and analyze the system by definitions and usages. For instance consider the below flow chart of a simple code.

Let us study two paths. The first paths follows the nodes 1,2,3,4,8 when we define the variable x = 1 and the second path follows the nodes 1, 2, 4, 5, 6, 5, 6, 5, 7, 8 when we define x = -1. Now we can analyze each using the following criteria.

- All definitions coverage.
- All c-use coverage.
- All c-use some p-use coverage.
- All p-use coverage.
- All p-use some c-use coverage.
- All use coverage.
- All du coverage.

### All Definitions Coverage

In all definitions coverage, we try to cover the definition clear paths to the first use. The path starts from the definition of the variable and ends at its first use. The use may be predicate or computational. In above example the definition clear path for variable x starts from node 1 and its first use is the predicate use at node 2. We write it as

(1, (2,t), x) // Here 1 is the starting node where x was defined and 2 is node where it was first time used. Since it is a predicate use, so we add t or f. The t for true condition and f for false condition. Other definition clear paths are as follows.

(3, 8, a). //The definition of data variable at node 3 and its first use is at node 8. Since it is the computational use, we did not add f or t for false or true branches.

(6, 6, x). // another definition of x is at node 6 and it is also used on the same node. Again, it is a computation use.

(7, 8, a). // The data variable a is again defined at node 7 followed by first use at node 8.

### All C-Use Coverage.

In all c-use coverage, we analyze the definition clear path for every variable from its definition node to its all computation uses. Here the condition is that the use should be computation type not predicate type. In above flow chart, we find following C-Uses.

(1, 3, x) // the variable x defined at node 1 followed by its computation use at node 3.

(1, 6, x) // the variable x defined at node 1 followed by its computation use at node 6.

(1, 7, x) // the variable x defined at node 1 followed by its computation use at node 7.

(3, 8, a) // the variable a defined at node 3 followed by its computation use at node 8.

(6, 6, x) // the variable x defined at node 6 followed by its computation use at node 6.

(6, 7 , x) // the variable x defined at node 6 followed by its computation use at node 7.

(7, 8 , a) // the variable a defined at node 7 followed by its computation use at node 8.

### All c-use some p-use coverage

Here we try to find the definition clear path from every data variable to all c-uses or computation uses and if we cannot find the definition clear path to any of the c-use, we try to find the definition clear path to its predicate use. All c-use some p-use coverage for the above example would be

(1, 3, x) // the variable x defined at node 1 followed by its computation use at node 3.

(1, 6, x) // the variable x defined at node 1 followed by its computation use at node 6.

(1, 7, x) // the variable x defined at node 1 followed by its computation use at node 7.

(3, 8, a) // the variable a defined at node 3 followed by its computation use at node 8.

(6, 6, x) // the variable x defined at node 6 followed by its computation use at node 6.

(7, 8 , a) // the variable a defined at node 7 followed by its computation use at node 8.

Note, it is same as all c-use coverage. The reason is, we do have c-use for every variable. We will look for p-use only if do not find c-use for that variable.

### All p-use coverage

Here we find the definition clear path from definition of a variable to all p-uses or predicate uses. For above example, we have following p-uses.

(1, (2, t), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation t is for true condition of the predicate node.

(1, (2, f), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation f is for false condition of the predicate node.

(1, (4, t), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation t is for true condition of the predicate node.

(1, (4, f), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation f is for false condition of the predicate node.

(1, (5 ,t), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(1, (5, f), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

(6, (5 ,t), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(6, (5, f), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

### All p-use some c-use coverage.

Here, we try to find trace all the p-uses of a data variable from its definition. If we cannot find any p-use for a definition of a data variable, we firs a c-use. Note in above example, we could not any p-use for definitions at node 3 and 7. So we will write c-use for those definitions. The rest of the coverage would remain the same.

(1, (2, t), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation t is for true condition of the predicate node.

(1, (2, f), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation f is for false condition of the predicate node.

(1, (4, t), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation t is for true condition of the predicate node.

(1, (4, f), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation f is for false condition of the predicate node.

(1, (5 ,t), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(1, (5, f), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

(6, (5 ,t), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(6, (5, f), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

And here c-uses for missing p-uses

(3, 8, a) // the variable a defined at node 3 followed by its computation use at node 8.

(7, 8 , a) // the variable a defined at node 7 followed by its computation use at node 8.

## All Uses Coverage

It includes all definition clear paths from definitions of the variables to all the c-uses and p-uses. In other words, the union of all p-uses and c-uses would be.

(1, 3, x) // the variable x defined at node 1 followed by its computation use at node 3.

(1, 6, x) // the variable x defined at node 1 followed by its computation use at node 6.

(1, 7, x) // the variable x defined at node 1 followed by its computation use at node 7.

(3, 8, a) // the variable a defined at node 3 followed by its computation use at node 8.

(6, 6, x) // the variable x defined at node 6 followed by its computation use at node 6.

(6, 7 , x) // the variable x defined at node 6 followed by its computation use at node 7.

(7, 8 , a) // the variable a defined at node 7 followed by its computation use at node 8.

(1, (2, t), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation t is for true condition of the predicate node.

(1, (2, f), x). // the variable x defined at node 1 followed by its predicate use at node 2. The notation f is for false condition of the predicate node.

(1, (4, t), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation t is for true condition of the predicate node.

(1, (4, f), x). // the variable x defined at node 1 followed by its predicate use at node 4. The notation f is for false condition of the predicate node.

(1, (5 ,t), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(1, (5, f), x). // the variable x defined at node 1 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

(6, (5 ,t), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation t is for true condition of the predicate node.

(6, (5, f), x). // the variable x defined at node 6 followed by its predicate use at node 5. The notation f is for false condition of the predicate node.

### All DU Paths

It is even bigger set than all uses. For all uses, our concern is to list down the definition clear paths from all definitions to all uses. But to move from a definition to a use, there may be more than one paths. The du path coverages not only includes all the p-uses and c-uses, it also cover all the paths that can lead to a p-use or c-use. For example, say a variable y was defined at node 2 and computationally used and at nodes 7 and 10. All use coverage will list down it as

(2, 7, x)

(2, 10 , x)

But if there are 2 paths to reach from node 2 to node 7 and 3 paths to reach from node 2 to node 10, all DU path will also include those paths thus creating 5 associations instead of 2.