In linear GP programs are linear sequences of instructions, such as the one in Figure 7.1 . The number of instructions can be fixed, meaning that every program in the population has the same length, or variable, meaning that different individuals can be of different sizes. In the following sections we discuss reasons for using linear GP (Section 7.1.1 ). We then provide more details on the different flavours of linear GP (Section 7.1.2 ). Finally, we describe briefly the main genetic operations for linear GP (Section 7.1.3 ).
There are two different reasons for trying linear GP. Firstly, almost all computer architectures represent computer programs in a linear fashion with neighbouring instructions being normally executed in consecutive time steps (albeit control structures, jumps and loops may change the execution order). So, why not evolve linear programs? This led Banzhaf ( 1993), Perkis ( 1994) as well as Openshaw and Turton (1994) to try linear GP.
Secondly, computers do not naturally run tree-shaped programs, so interpreters or compilers have to be used as part of tree-based GP. On the contrary, by evolving the binary bit patterns actually obeyed by the computer, linear GP can avoid the use of this computationally expensive machinery and GP can run several orders of magnitude faster. This desire for speed drove Nordin ( 1994), Nordin et al. (1999), Crepeau ( 1995) and Eklund ( 2002).
7.1.2 Linear GP Representations
As discussed in Section 2.1 , it is possible to use a linear representation in tree-based GP. When doing so, however, the linear structures are simply flattened representations of the trees. Thus, in the linear structure one can still identify the root node, its children, and the rest of the tree structure. In such a system, instructions typically communicate via their arguments.
The semantics of linear GP are quite different, however. In linear GP, instructions typically read their input(s) from one or more registers or memory locations and store the results of their calculations in a register. For example, they might take the form shown in Figure 7.2 . This means instructions in linear GP all have equivalent roles and communicate only via registers or memory. In linear GP there is no equivalent of the distinction between functions and terminals inherent in tree-based GP. Also, in the absence of loops or branches, the position of the instructions determines the order of their execution. Typically, this not the case for the structures representing trees.1
The instructions in linear GP may or may not represent executable machine code. That is, there are essentially two flavour of linear GP: machine code GP, where each instruction is directly executable by the CPU, and interpreted linear GP, where each instruction is executable by some higher-level virtual machine (typically written in an efficient language such as C or C++). When the instructions are actual machine code, then the order of the elements of the representation shown in Figure 7.2 is determined by the particular computer architecture used, and the corresponding data must be packed into bit fields of appropriate sizes. The overhead of packing and unpacking data can be avoided, however, when one is using virtual machine instructions since then the designer of a GP system has complete freedom as to how the virtual machine will interpret its instructions.
If the goal is execution speed, then the evolved code should be machine code for a real computer rather than some higher level language or virtual-machine code. This is why Nordin ( 1994) started by evolving machine code for SUN computers and Crepeau ( 1995) targeted the Z80. The linear GP of Leung, Lee, and Cheang (2002) was designed for novel hardware, but much of the GP development had to be run in simulation whilst the hardware itself was under development.
The Sun SPARC has a simple 32-bit RISC architecture which eases designing genetic operations which manipulate its machine code. Nordin (1997) wrapped each machine code GP individual (which was a sequence of machine instructions) inside a C function. Each of the GP program's inputs was copied from one of the C function's arguments into one of the machine registers. As well as the registers used for inputs,2 a small number (e.g., 2-4) of other registers are used for scratch memory to store partial results of intermediate calculations. Finally, the GP simply leaves its answer in one of the registers. The external framework uses this as the C function's return value.
Since Unix was ported onto the x86, Intel's complex instruction set, which was already standard with Windows-based PCs, has had almost complete dominance. Seeing this, Nordin ported his Sun RISC linear GP system onto Intel's CISC. Various changes were made to the genetic operations which ensure that the initial random programs are made only of legal Intel machine code and that mutation operations, which act inside the x86's 32-bit word, respect the x86's complex sub-fields. Since the x86 has instructions of different lengths, special care has to be taken when altering them. Typically, several short instructions are packed into each 4-byte word. If there are any bytes left over, they are filled with no-operation codes. In this way, best use is made of the available space, without instructions crossing 32-bit boundaries. Nordin's work led to Discipulus (Foster, 2001), which has been used in applications ranging from bioinformatics (?) to robotics (Langdon and Nordin, 2001) and bomb disposal (Deschaine, Hoover, Skibinski, Patel, Francone, Nordin, and Ades, 2002).
Note that execution speed is not the only reason for using linear GP. Although interpreted linear programs are slower than machine-code programs, an interpreted linear GP system can be more efficient than an interpreted tree-based systems. Also, a simple linear structure lends itself to rapid analysis. Brameier and Banzhaf (2001) showed a linear program can be easily scanned and any "dead code" it contains can be removed. In some ways the search space of linear GP is also easier to analyse than that of trees (Langdon, 1999b, 2002a, b, 2003a; Langdon and Banzhaf, 2005). For example, Langdon ( 2006); Langdon and Poli (2006, ?, 2008) have used (in simulation) two simple architectures, called T7 and T8), for several large scale experiments and for the mathematical analysis of Turing complete GP. For these reasons, it makes sense to consider linear virtual-machine code GP even when using languages like Java that are typically run on virtual machines; one can in fact use a virtual machine (like (Leung et al., 2002)) to interpret the evolved byte code (Harvey, Foster, and Frincke, 1999; Lukschandl, Borgvall, Nohle, Nordahl, and Nordin, 2000).
7.1.3 Linear GP Operators
The typical crossover and mutation operators for linear GP ignore the details of the machine code of the computer being used. For example, crossover may choose randomly two crossover points in each parent and swaps the code between them. Since the crossed over fragments are typically of different lengths, such a crossover may change the programs' lengths, cf. Figure 7.3 . Since computer machine code is organised into 32- or 64-bit words, the crossover points occur only at the boundaries between words. Therefore, a whole number of words, containing a whole number of instructions are typically swapped over. Similarly, mutation operations normally respect word boundaries and generate legal machine code. However, linear GP lends itself to a variety of other genetic operations. For example, Figure 7.4 shows homologous crossover. Many other crossover and mutation operations are possible (Langdon and Banzhaf, 2005).
PARENT 1 PARENT 2 OFFSPRINGFigure 7.3: Typical linear GP crossover. Two instructions are randomly chosen in each parent (top two genomes) as cut points. The code fragment excised from the first parent is then replaced with the code fragment excised from the second to generate the child (lower chromosome).
PARENT 1 PARENT 2 OFFSPRING 1 OFFSPRING 2Figure 7.4: Discipulus's "homologous" crossover (Foster, 2001; Francone et al., 1999; Nordin et al., 1999). Crossover is performed on two parents (top two programs) to yield two offspring (bottom). The two crossover points are the same in both parents, so the exised code does not change its position relative to the start of the program (left edge), and the child programs have the same lengths as their parents. Homologous crossover is often combined with a small amount of normal two point crossover (Figure 7.3 ) to introduce length changes into the GP population.
In a compiling genetic programming system (Banzhaf, Francone, and Nordin, 1996) the mutation operator acts on machine code instructions and is constrained to "ensure that only instructions in the function set are generated and that the register and constant values are within predefined ranges allowed in the experimental set up". On some classification problems Banzhaf et al. (1996) reported that performance was best when using crossover and mutation in equal proportions. They suggested that this was due to the GP population creating "introns" (blocks of code that does not affect fitness) in response to the crossover operator, and that these were subsequently converted into useful genetic material by their mutation operator.