Contents Top Previous Next

10.2 Reducing Cost of Fitness with Caches

In computer hardware it is common to use data caches which automatically hold copies of data locally in order to avoid the delays associated with fetching them from disk or over a network every time they are needed. This can work well when a small amount of data is needed many times over a short interval.

Caches can also be used to store results of calculations, thereby avoiding the re-calculation of data (Handley1994). GP populations have enormous amounts of common code (Langdon1998Langdon and Banzhaf2005Langdon and Poli2008). This is, after all, how genetic search works: it promotes the genetic material of fit individuals. So, typically in each generation we see many copies of successful code.

In many (but by no means all) GP systems, subtrees have no side-effects. This means results pass through a program's root node in a well organised and easy to understand fashion. Thus, if we remember a subtree's inputs and output when it was run before, we can avoid re-executing code whenever we are required to run the subtree again. Note that this is true irrespective of whether we need to run the same subtree inside a different individual or at a different time (i.e., a later generation). Thus, if we stored the output with the root node, we would only need to run the subtree once for any given set of inputs. Whenever the interpreter comes to evaluate the subtree, it needs only to check if the subtree's root contains a cache of the values the interpreter calculated last time, thus saving considerable computation time.

In order to achieve this, however, we need to overcome a problem: not only must the answer be stored, but the interpreter needs to know that the subtree's inputs are the same too. The common practices of GP come to our aid here. Usually every tree in the population is run on exactly the same inputs for each of the fitness cases. Thus, for a cache to work, the interpreter does not need to know a tree's inputs in detail, it need only know which of the fixed set of test cases was used.

A simple means of implementing this type of cache is to store a vector of values returned by each subtree for each of the test cases. Whenever a subtree is created (i.e., in the initial generation, by crossover or by mutations) the interpreter is run and the cache of values for its root node is set. Note this is recursive, so caches can also be calculated for subtrees within it at the same time. Now, when the interpreter is run and comes to a subtree's root node, it will simply retrieve the value it calculated earlier, using the test case's number as an index into the cache vector.

If a subtree is created by mutation, then its cache of values will be initially empty and will have to be calculated. However, this costs no more than it would without caches.

When code is inserted into an existing tree, be it by mutation or crossover, the chance that the new code behaves identically to the old code is normally very small. This means that the caches of every node between the new code and the root node may be invalid. The simplest solution is to re-evaluate them all. This may sound expensive, but the caches in all the other parts of the individual remain valid and can be used when the cache above them is re-evaluated. Thus, in effect, if the crossed over code is inserted at depth d, only d nodes need to be evaluated.

The whole question of monitoring how effective individual caches are, what their hit-rates are, etc. has been little explored. In practice, impressive savings have been achieved by simple implementations, with little monitoring and rudimentary garbage collection. Recent analysis (Ciesielski and Li2004Dignum and Poli2007Langdon and Poli2002Poli et al.2007) has shown that GP trees tend not to have symmetric shapes, and many leaves are very close to the root. This provides a theoretical explanation for why considerable computational saving can be made by using fitness caches. While it is possible to use hashing schemes to efficiently find common code, in practice assuming that common code only arises because it was inherited from the same location (e.g., by crossing over) is sufficient.

As well as the original Directed acyclic graph (DAG) implementation (Handley1994) other work includes (Ciesielski and Li2004Keijzer1996McPhee, Hopper, and Reierson1998?). While so far we have only considered programs where no side effects take place, there are cases where caching can be extended outside this domain. For example, Langdon ( 1998) used fitness caches in evolved trees with side effects by exploiting syntax rules about where in the code the side-effects could lie.

Contents Top Previous Next