Since GP is a stochastic search algorithm, different runs may have different outcomes and yield different results. Because of this, one needs to be very careful in making inferences regarding the degree of success of the system from a small set of runs.
It is possible, for example, to run a GP system 10 times on a particular problem, observe that all 10 runs failed to find a solution, and conclude that GP cannot solve the problem. However, if the success probability is say 5% with a particular choice of parameters and representation, the probability of doing 10 runs and all of them failing is almost 60%! So, the failure to solve the problem in these 10 runs should not come as a surprise, even though there's a reasonable chance that you would find a solution if you did more runs.
For precisely this reason, it is very important to do enough runs and use appropriate statistical tests to ensure that conclusions are statistically significant.
GP runs can often be very time consuming, especially if the fitness function is computationally expensive. While parallel and distributed computing (see Section 10.4 ) can significantly speed up the process, tools from the design of experiments literature (Bartz-Beielstein, 2006) can also be used to reduce the number of different runs that are necessary to explore the space in a statistically sound manner.
A common GP application is classification, e.g., evolving a program or function that can classify patient biopsy data into two categories: cancerous or benign. There are numerous pitfalls in this type of work, such as using all the available data as training data, thereby leaving nothing to use for validating your evolved solution on unseen data. There is a broad literature on this and related subjects, and numerous tools such as cross-validation that one can use when not enough data are available. (See, for example, (Hastie, Tibshirani, and Friedman, 2001).) The aim must be to ensure that your results can be trusted to work in the real world, rather than in just the synthetic environment created by the fitness cases we chose.