If you're not getting your desired results, it is important to take the time to dig around in the populations and see what is actually being evolved.2 For example, if you're using ADFs because you think that your problem would benefit from a modular solution, examine the individuals that you're evolving. Are they using ADFs? (Sometimes the result producing branch simply will not refer to the ADFs at all.) Are they using them in a modular way? Are ADFs being used multiple times? Do the ADFs encapsulate some interesting logic, or are they just re-naming an input variable? If you're using grammatical evolution, on the other hand, are your evolved individuals using your grammar as you expected? Or is the grammar in fact biasing the system in an undesirable and unexpected way? Similar questions can be asked for almost any flavour of GP; think about your goals and expectations, and explore your populations to see to what degree those are being met.
Similarly, it can be valuable to look at the way your population changes over time in more detail than that provided by the standard plot of fitness vs. time. You might look at the distribution of tree sizes during your run, or the distribution of fitness values. The distribution of fitness values might suggest things about the structure of the search space as seen by your GP system. If it seems to be dominated by disjoint values with large gaps between them, then jumping those gaps may be a major challenge for your system and it may be the cause for poor performance.
While it is important to look inside your populations, the time and effort required to do so is effectively a function of how much information is recorded. Computer algorithms can easily generate enormous amounts of data, especially if you produce a detailed log of events and individuals generated during your runs. Consequently, processing those results may become a challenging data-mining exercise. Finding good ways to visualise those large data sets can be extremely valuable. While there are a handful of papers that specifically address visualisation, e.g., (Daida, Hilss, Ward, and Long, 2005; Pohlheim, 1999; ?), and even the occasional workshop (Smith, Bullock, and Bird, 2002), most visualisation techniques are scattered through the literature and we are unaware of any comprehensive review. Where we can provide a bit more guidance is program visualisation.
An obvious (but easy to forget) advantage of GP is that we create visible programs. This need not be the case with other approaches. So, when presenting GP results, as a matter of routine one should consider making a figure which contains the whole evolved program. The dot component of the Graphviz package3 can be particularly helpful in this regard; Figure 6.2 is an example of a tree diagram generated with a simple dot input file. The program lisp2dot4 can help with the conversion from Lisp-style expressions to dot input files.
As the evolved trees can often be very large, it is usually helpful to perform at least some basic simplifications such as removing excess significant digits in constants and combining constant terms. Naturally, after cleaning up the evolved program, one should make sure it still works; you should also clearly indicate in any presentation or write-up that the program you're presenting has been cleaned and is not the actual tree generated by GP.
There are methods to automatically simplify expressions (e.g., in Mathematica and Emacs). However, since in general there is an exponentially large number of equivalent expressions, automatic simplification is hard. Another way is to use GP as a multi-objective evolutionary algorithm (cf. Chapter 9 .)
In some cases the details of the trees are less important than their general size and shape. Daida et al. (2005) presented a particularly useful set of visualisation techniques for this situation.5 These techniques allow one to see the size and shape of both individual trees as well as an aggregate view of entire populations. Figure 13.1 , for example, shows the impact of size and depth limits on the size and shape of trees in two different runs with very similar average sizes and depths. The plots make it clear, however, that the shapes of the resulting trees were quite different.
Figure 13.1: Visualisation of the size and shape of the entire population of 1,000 individuals in the final generation of runs using a depth limit of 50 (on the left) and a size limit of 600 (on the right). The inner circle is at depth 50, and the outer circle is at depth 100. These plots are from (Crane and McPhee, 2005) and were drawn using the techniques described in (Daida et al., 2005).