5.2.1 Is Mutation Necessary?
Mutation was used in early experiments in the evolution of programs, e.g., in (Bickel and Bickel, 1987; Cramer, 1985; Fujiki and Dickinson, 1987). It was not, however, used in (Koza, 1992) and (Koza, 1994), as Koza wished to demonstrate that mutation was not necessary and that GP was not performing a simple random search. This has significantly influenced the field, and mutation is often omitted from GP runs. While mutation is not necessary for GP to solve many problems, O'Reilly ( 1995) argued that mutation -- in combination with simulated annealing or stochastic iterated hill climbing -- can perform as well as crossover-based GP in some cases. Nowadays, mutation is widely used in GP, especially in modelling applications. Koza also advises to use of a low level of mutation; see, for example, (Koza, Bennett, Andre, and Keane, 1996b).
Comparisons of crossover and mutation suggest that including mutation can be advantageous. Chellapilla ( 1997b) found that a combination of six mutation operators performed better than previously published GP work on four simple problems. Harries and Smith (1997) also found that mutation based hill climbers outperformed crossover-based GP systems on similar problems. Luke and Spector (1997) suggested that the situation is complex, and that the relative performance of crossover and mutation depends on both the problem and the details of the GP system.
5.2.2 Mutation Cookbook
With linear bit string GAs, mutation usually consists of random changes in bit values. In contrast, in GP there are many mutation operators in use. Often multiple types of mutation are beneficially used simultaneously (e.g., see (Kraft, Petry, Buckles, and Sadasivan, 1994) and (Angeline, 1996)). We describe a selection of mutation operators below:
Subtree mutation replaces a randomly selected subtree with another randomly created subtree (Koza, 1992, page 106). Kinnear (1993) defined a similar mutation operator, but with a restriction that prevents the offspring from being more than 15% deeper than its parent. Size-fair subtree mutation was proposed in two forms by Langdon ( 1998). In both cases, the new random subtree is, on average, the same size as the code it replaces. The size of the random code is given either by the size of another random subtree in the program or chosen at random in the range [l/2,3l/2] (where l is the size of the subtree being replaced). The first of these methods samples uniformly in the space of possible programs, whereas the second samples uniformly in the space of program lengths. Experiments suggested that there was far more bloat (cf. Section 11.3.1 page 224 ) with the first mutation operator. Node replacement mutation (also known as point mutation) is similar to bit string mutation in that it randomly changes a point in the individual. In linear GAs the change would be a bit flip. In GP, instead, a node in the tree is randomly selected and randomly changed. To ensure the tree remains legal, the replacement node has the same number of arguments as the node it is replacing, e.g. (McKay, Willis, and Barton, 1995, page 488). Hoist mutation creates a new offspring individual which is copy of a randomly chosen subtree of the parent. Thus, the offspring will be smaller than the parent and will have a different root node (Kinnear, 1994a). Shrink mutation replaces a randomly chosen subtree with a randomly created terminal (Angeline, 1996). This is a special case of subtree mutation where the replacement tree is a terminal. As with hoist mutation, it is motivated by the desire to reduce program size. Permutation mutation selects a random function node in a tree and then randomly permuting its arguments (subtrees). Koza ( 1992) used permutation in one experiment [page 600] where it was shown to have little effect. In contrast, Maxwell (1996) had more success with a mutation operator called swap, which is simply a permutation mutation restricted to binary non-commutative functions. Mutating constants at random Schoenauer, Sebag, Jouve, Lamy, and Maitournam (1996) mutated constants by adding random noise from a Gaussian distribution. Each change to a constant was considered a separate mutation. Mutating constants systematically A variety of potentially expensive optimisation tools have been applied to try and fine-tune an existing program by finding the "best" value for the constants within it. Indeed STROGANOFF (Iba, Sato, and de Garis, 1995b; Nikolaev and Iba, 2006) optimises each tree modified by crossover. Clever mechanisms are employed to minimise the computation required.
(McKay et al., 1995, page 489) is more in keeping with traditional GP and uses a mutation operator that operates on terminals, replacing input variables by constants and vice versa. In this approach "whenever a new constant is introduced [...] a non-linear least squares optimisation is performed to obtain the best' value of the constant(s)". Schoenauer, Lamy, and Jouve (1995) also used a mutation operator that affects all constants in an individual where "a numerical partial gradient ascent is achieved to reach the nearest local optimum". Finally, Sharman, Esparcia Alcazar, and Li (1995) used simulated annealing to update numerical values (which represented signal amplification gains) within individuals.