When big changes appear to make little difference, this can sometimes be used to identify problems with the domain representation and fitness measure. Alternatively it may be that the problem is simply too difficult, and no change is likely to make a significant difference.
Suppose that you're not making much progress during a set of runs. One might react by sweeping the parameter space, doing runs with a variety of different parameter settings in the hope of finding a better collection of parameter values. What if changing the parameter values really does not have much impact? That may mean that GP just is not able to gain any traction given your current representation of the problem domain and fitness function. You might, therefore, reconsider how the problem is posed to GP. If the representation and fitness make the problem essentially a search for a needle in a haystack, then GP will mostly be lost searching through highly sub-optimal solutions. If so, altering parameter values is unlikely to help.
Note that essentially the same symptoms are also observed if the problem is really beyond the capabilities of your computing resources. For example, if the solutions are exceptionally rare, unless there are nice fitness gradients guiding GP towards them, finding any solution will likely be beyond the capacity of current computer technology.
How can one distinguish which is the cause of the lack of success? Is it a bad choice of representation and fitness or is it just an extremely hard problem? To answer these questions, it is important to look at what happened when the population size was varied. Even in the absence of fitness guidance, GP will search. In fact, it will perform a sort of random exploration of the search space. It may not be a particularly rational exploration -- we know, for example, that GP with subtree crossover tends to oversample and re-sample short programs -- yet, it is still a form of stochastic search. Thus, one may expect that, if the problem is solvable, as the population size is progressively increased, sooner or later we should start seeing some variation in the fitness of programs. This may be sufficient for evolution to work with, particularly if we help it by improving the representation and fitness function. If, instead, nothing interesting happened as the population size was increased, then perhaps you don't have enough computing power to solve the problem as posed, or the problem has no solution.