There is a widespread belief that computer programs are fragile and that any change to any bit in them will cause them to stop working. This is fostered by the common knowledge that a small typing mistake by a human programmer can sometimes introduce a troublesome bug into a program. Programmers know from painful experience, however, that far from proving immediately fatal, errors can lay hidden for years. Further, not all errors are created equal. Some are indeed critical and must be dealt with immediately, while others are rare or largely inconsequential and so never become a major priority. The worst are arguably the severe bugs that rarely express themselves, as they can be extremely difficult to pin down yet still have dire consequences when they appear.
In summary, there is no such thing as a perfect (non-trivial) human-written program and all such programs include a variety of errors of different severity and with a different frequency of manifestation.9
This sort of variability is also very common in GP work. It provides the sort of toehold that evolution can exploit in the early generations of GP runs. The population of programs just needs to contain a few which move vaguely in the right direction. Many of their offspring may be totally blind or have no legs, just so long as a few continue to slime towards the light. Over generations evolution may hopefully cobble together some useful features from this initially unpromising ooze. The results, however, are unlikely to be perfect or pretty. If you as a GP engineer insist on only accepting solutions that are beautifully symmetric and walk on two legs on day one, you are likely to be disappointed. As we have argued above, even human-written programs often only approximate their intended functionality. So, why should we not accept the same from GP?
If you accept this notion, then it is important to provide your system with some sort of gradient upon which to act, allowing it to evolve ever better approximations. It is also important to ensure that your test environment (usually encapsulated in the fitness function) places appropriate emphasis on the most important features of the space from a user perspective. Consider a problem with five test cases, four of which are fairly easy and consequently less important, with the fifth being crucial and quite difficult. A likely outcome in such a setting is that individuals that can do the four easier tasks, but are unable to make the jump to the fifth. There are several things you could try: 1) weighting the hard task more heavily, 2) dividing it up in some way into additional sub-tasks, or 3) changing it from being a binary condition (meaning that an individual does or does not succeed on the fifth task) to a continuous condition, so that an individual GP program can partially succeed on the fifth task. The first of these options is the simplest to implement. The second two, however, create a smoother gradient for the evolutionary process to follow, and so may yield better results.