While admirers of linear GP will suggest that machine code GP is the ultimate in speed, all forms of GP can be made faster in a number of ways. The first is to reduce the number of times a program is evaluated.
Many applications find the fitness of programs by running them on multiple training examples. The use of many examples provides an accurate evaluation of a program's quality. However, ultimately the point of fitness evaluation is to make a binary decision -- does this individual get a child or not? The overwhelming proportion of GP's computational effort (or indeed the effort in any evolutionary computation technique) goes into adjusting the probability of this binary decision. However, it is not clear that a high-precision fitness evaluation is always necessary to decide well. Indeed, even when the fitness evaluation is very accurate, most selection algorithms,1 being stochastic, inject noise into the decision of which points in the search space to proceed from and which to abandon. In these cases, reducing the number of times a program is evaluated is effectively an additional source of noise. If a program has already demonstrated it works poorly compared to the rest of the population on a fraction of the available training data, it not likely to be chosen as a parent. Conversely, if it has already exceeded many programs in the population after being tested on only a fraction of the training set, then it is likely to be chosen as a parent (Langdon, 1998). In either case, it is apparent that we do not gain much by running it on the remaining training examples. Teller and Andre (1997) developed these ideas into a useful algorithm called the rational allocation of trials.
As well as the computational cost, there are other negatives consequences that come from using all the training data all the time, as doing so gives rise to a static fitness function. In certain circumstances this may encourage the population to evolve into a cul-de-sac where it is dominated by offspring of a single initial program which did well on some fraction of the training cases, but was unable to fit others. A static fitness function can create conditions where good programs that perform moderately well on most portions of the training data have lower fitness than those that do very well in only a few small regions. With high selection pressure, it takes surprisingly little time for the best individual to dominate the whole population.2
Gathercole and Ross (1994, 1997) investigated a number of ways of dynamically changing training samples,3 yielding a number of interacting effects. Firstly, by using only a subset of the available data, the GP fitness evaluation took less time. Secondly, by changing which examples were being used over time, the evolving population saw more of the training data and so was less liable to over fit a fraction of them. Thirdly, by randomly changing the fitness function, it became more difficult for evolution to produce an overspecialised individual which took over the population at the expense of solutions which were viable on other parts of the training data. Dynamic subset selection (DSS) appears to have been the most successful of Gathercole's suggested algorithms. It has been incorporated into Discipulus (see page 149 ), and was recently used in a large data mining application (Curry, Lichodzijewski, and Heywood, 2007).
Where each fitness evaluation may take a long time, it may be attractive to interrupt a long-running program in order to let others run. In GP systems which allow recursion or contain iterative elements (Brave, 1996; Langdon, 1998; ?; ?) it is common to enforce a time limit, a limit on the number of instructions executed, or a bound on the number of times a loop is executed. Maxwell ( 1994) proposed a solution to the question of what fitness to give to a program that has been interrupted. He allowed each program in the population a quantum of CPU time. When the program used up its quantum it was check-pointed.4 In Maxwell's system, programs gained fitness as they ran, i.e., each time a program correctly processed a fitness case, its fitness was incremented. Tournament selection was then performed. If all members of the tournament had used the same number of CPU quanta, then the fitter program was the winner. If, however, one program had used less CPU than the others (and had a lower fitness) then it was restarted and run until it had used as much CPU as the others. Then fitnesses were compared in the normal way.
Teller (1994) had a similar but slightly simpler approach: every individual in the population was run for the same amount of time. When the allotted time elapsed a program was aborted and an answer extracted from it, regardless of whether it had terminated or not. Teller called this an "any time" approach. This suits graph systems like Teller's PADO (Section 7.2.2 ) or linear GP (Chapter 7.1 ) where it is easy to designate a register as the output register. The answer can then be extracted from this register or from an indexed memory cell at any point (including whilst the programming is running). Other any time approaches include (Spector and Alpern, 1995) and (Langdon and Poli, 2008).
A simple technique to speed up the evaluation of complex fitness functions is to organise the fitness function into stages of progressively increasing computational cost. Individuals are evaluated stage by stage. Each stage contributes to the overall fitness of a program. However, individuals need to reach a minimum fitness value in each stage in order for them to be allowed to progress to the next stage and acquire further fitness. Often different stages represent different requirements and constraints imposed on solutions.
Recently, a sophisticated technique called backward chaining GP has been proposed (Poli, 2005; Poli and Langdon, 2005a, b, 2006a). In GP and other evolutionary algorithms which use tournament selection with small tournament sizes, backward chaining can radically reduce the number of fitness evaluations. Tournament selection randomly draws programs from the population to construct tournaments, the winners of which are then selected. Although this process is repeated many times in each generation, when the tournaments are small there is a significant probability that an individual in the current generation is never chosen to become a member of any tournament. By reordering the way operations are performed, backward chaining GP exploits this. It not only avoids fitness calculations for individuals that are never included in a tournament, but can also achieve higher fitness sooner.