Contents Top Previous Next

The purpose of the first two preparatory steps is to specify the ingredients the evolutionary process can use to construct potential solutions. Because the problem is to find a mathematical function of one independent variable, x, the terminal set (the inputs of the to-be-evolved programs) must include this variable. The terminal set also includes ephemeral random constants drawn from some reasonable range,1 say from -5.0 to +5.0, as described in Section 3.1 . Thus the terminal set, T, is

The statement of the problem does not specify which functions may be employed in the to-be-evolved program. One simple choice for the function set is the four ordinary arithmetic functions: addition, subtraction, multiplication and division. Most numeric regression problems will require at least these operations, sometimes with additional functions such as sin() and log(). We will use the simple function set

where % is protected division as discussed in Section 3.2.1 . Note that the target polynomial can be expressed exactly using the terminal and function sets we have chosen, so these primitives are sufficient (cf. page 56 ) for the quadratic polynomial problem.

The third preparatory step involves constructing the
fitness measure that specifies what the user wants. The high-level goal
of this problem is to find a program whose output is equal to the
values of the quadratic polynomial x^{2}+x+1.
Therefore, the fitness assigned to a particular individual in the
population must reflect how closely the output of an individual program
comes to the target polynomial x^{2}
+ x + 1.

In principle, the fitness measure could be defined in
terms of the mathematical integral of the difference between the
evolved function and the target function. However, for most symbolic
regression problems, it is not practical or even possible to compute
the value of the integral analytically. Thus, it is common to define
the fitness to be the sum of absolute errors
measured at different values of the independent variable
x in the range [-1.0
,+1.0]. In
particular, we will measure the errors for x Î{-1.0
,-0.9
,,0.9
,1.0}.
A smaller value of fitness (error) is better; a fitness (error) of zero
would indicate a perfect fit. With this definition, our fitness is
(approximately) proportional to the area between the parabola
x^{2} +
x + 1 and the curve representing the candidate individual (see
Figure 4.2
for examples).

The fourth step is where we set our run parameters. The population size in this small illustrative example will be just four. The population size for a run of GP typically consists of thousands of individuals, but we will use this tiny population size to keep the example manageable. The crossover operation is commonly used to generate about 90% of the individuals in the population; the reproduction operation (where a fit individual is simply copied from one generation to the next) is used to generate about 8% of the population; the mutation operation is used to generate about 1% of the population; and the architecture-altering operations (see Section 6.1.2 ) are used to generate perhaps 1% of the population. However, because this example involves an abnormally small population of only four individuals, the crossover operation will be used twice (each time generating one individual), which corresponds to a crossover rate of 50%, while the mutation and reproduction operations will each be used to generate one individual. These are therefore applied with a rate of 25% each. For simplicity, the architecture-altering operations are not used for this problem.

In the fifth and final step we need to specify a termination condition. A reasonable termination criterion for this problem is that the run will continue from generation to generation until the fitness (or error) of some individual is less than 0.1. In this contrived example, our example run will (atypically) yield an algebraically perfect solution with a fitness of zero after just one generation.

www.gp-field-guide.org.uk

Contents Top Previous Next