The first EDA-type GP system was effectively an extension of PBIL to trees called probabilistic incremental program evolution (PIPE) (Salustowicz and Schmidhuber, 1997; Saustowicz, Wiering, and Schmidhuber, 1998; Salustowicz and Schmidhuber, 1999). In PIPE, the population is replaced by a hierarchy of probability tables organised into a tree (such as the one in Figure 8.1 ). Each table represents the probability that a particular primitive will be chosen at that specific location in a newly generated program tree. At each generation a population of programs is created based on the current tree of probability tables. The generation of a program begins by choosing a root node based on the probabilities in the root table, and then continuing down the hierarchy of probability tables until all branches of the tree are complete (i.e., a terminal has been chosen on each branch). The fitness of these new programs is computed, and the probability hierarchy is updated on the basis of these fitnesses, so as to make the generation of above-average fitness programs more likely in the next generation.
A positive feature of PIPE is that the probability of choosing a particular primitive can vary with its depth (and, more generally, position) in the tree. This makes it possible, for example, for terminals to become increasingly probable as a node's depth increases. A limitation of PIPE, however, is that the primitives forming a tree are chosen independently from each other,2 so it is impossible for PIPE to capture dependencies among primitives. Another limitation is that the maximum size of the generated trees is constrained by the size of the tree of probability tables. Ondas, Pelikan, and Sastry (2005) compared the performance of PIPE and standard tree-based GP on a small set of artificial problems, including a GP version of one-max 3 and a GP version of the fully deceptive trap function4. Results suggest that PIPE and standard GP have similar scaling properties, but that standard subtree crossover inherently links neighbouring nodes whereas PIPE does not.
Figure 8.1: Example of probability tree used for the generation of programs in PIPE. New program trees are created starting from the root node at the top and moving through the hierarchy. Each node in an offspring tree is selected from the left hand side of the corresponding table with probability given by the right hand side. Each branch of the tree continues to expand until either the tree of probability tables is exhausted or a leaf (e.g., R) is selected.
Sastry and Goldberg (2003) proposed an algorithm called extended compact GP (eCGP) which effectively extends the eCGA algorithm (Harik, 1999) to trees. Like PIPE, eCGA assumes that all trees will fit within a fixed maximal tree. It partitions the nodes in this maximal tree into groups. The nodes in a group are assumed to be linked and their co-occurrence is modelled by a full joint distribution table. As with eCGA, the probability of generating a particular tree is given by the product of the probabilities of generating each group of nodes using the groups' joint distributions. An advantage of this system is that, unlike PIPE, it captures dependencies among primitives. However, to the best of our knowledge this system has only been tested on the two artificial problems used by Ondas et al. (2005) to compare PIPE and GP. Consequently its behaviour on more typical GP problems is unknown.
? proposed an EDA called estimation of distribution programming (EDP) which, in principle, can capture complex dependencies between a node in a program tree and the nodes directly above it or to its left.5 As with eCGP and PIPE, programs are tree-like and are assumed to always fit within an ideal maximal full tree. A conditional probability table is necessary for each node in such a tree to capture the dependencies. To keep the size of data structures manageable, only pairwise dependencies between each node and its parent were stored and used. EDP was tested on both the Max problem (Gathercole and Ross, 1995) and the 6-multiplexer. A later hybrid algorithm combining EDP and GP was proposed (?) which showed promise when tested on three symbolic regression problems.
An EDA based on a hierarchical BOA was used as the main mechanism to generate new individuals in meta-optimising semantic evolutionary search (MOSES) (Looks, 2007). This combined multiple strategies and used semantics to restrict and direct the search. BOA was also used to evolve programs in (Looks, Goertzel, and Pennachin, 2005) using a specialised representation for trees.
Recently an EDA-based system called N-gram GP (Poli and McPhee, 2008a) has been proposed that allows the evolution of linear GP programs. To some extent, N-gram GP overcomes the common difficulties EDAs have in performing local search when using a centralised population model. The N-gram GP system is able to capture both the local and the global features of the optimal sampling distribution, albeit at the cost of imposing certain other constraints. This makes it possible, for example, for the search to focus on the neighbourhood of a small number of individuals without the need to choose among them. Tests on polynomial symbolic regression problems and the lawnmower problem were very encouraging.