Koza (1992) was the first to use genetic programming to perform compression. He considered, in particular, the lossy compression of images. The idea was to treat an image as a function of two variables (the row and column of each pixel) and to use GP to evolve a function that matches as closely as possible the original. One can then use the evolved GP tree as a lossy compressed version of the image, since it is possible to obtain the original image by evaluating the program at each row-column pair of interest. The technique, which was termed programmatic compression, was tested on one small synthetic image with good success. Programmatic compression was further developed and applied to realistic data (images and sounds) by Nordin and Banzhaf (1996).
Iterated Functions System (IFS) are important in the domain of fractals and the fractal compression algorithm. Lutton, Levy-Vehel, Cretin, Glevarec, and Roll (1995a, b) used genetic programming to solve the inverse problem of identifying a mixed IFS whose attractor is a specific binary (black and white) image of interest. The evolved program can then be taken to represent the original image. In principle, this can then be further compressed. The technique is lossy, since rarely the inverse problem can be solved exactly. No practical application or compression ratio results were reported in (Lutton et al., 1995a,b). Using similar principles, Sarafopoulos ( 1999) used GP to evolve affine IFSs whose attractors represent a binary image containing a square (which was compressed exactly) and one containing a fern (which was achieved with some error in the finer details).
Wavelets are frequently used in lossy image and signal compression. Klappenecker and May (1995) used GP to evolve wavelet compression algorithms (internal nodes represented conjugate quadrature filters, leaves represented quantisers). Results on a small set of real-world images were impressive, with the GP compression outperforming JPEG at all compression ratios.
The first lossless compression technique (Fukunaga and Stechert, 1998) used GP to evolve non-linear predictors for images. These were used to predict the gray level a pixel will take based on the gray values of a subset of its neighbours (those that have already been computed in a row-by-row and column-by-column scan of the image array). The prediction errors together with the model's description represent a compressed version of the image. These were compressed using the Huffman encoding. Results on five images from the NASA Galileo Mission database were very promising with GP compression outperforming some of the best human-designed lossless compression algorithms.
In many compression algorithms some form of pre-processing or transformation of the original data is performed before compression. This often improves compression ratios. Parent and Nowe (2002) evolved pre-processors for image compression using GP. The objective of the pre-processor was to reduce losslessly the entropy in the original image. In tests with five images from the Canterbury corpus, GP was successful in significantly reducing the image entropy. As verified via the application of bzip2, the resulting images were markedly easier to compress.
In (Krantz, Lindberg, Thorburn, and Nordin, 2002) the use of programmatic compression was extended from images to natural videos. A program was evolved that generates intermediate frames of video sequence, where each frame is composed by a series of transformed regions from the adjacent frames. The results were encouraging in the sense that a good approximation to frames was achieved. While a significant improvement in compression was achieved, programmatic compression was very slow in comparison with standard compression methods, the time needed for compression being measured in hours or even days. Acceleration in GP image compression was achieved in (He, Wang, Zhang, Wang, and Fang, 2005), where an optimal linear predictive technique was proposed which used a less complex fitness function.
Recently Kattan and Poli (2008) proposed a GP system called GP-ZIP for lossless data compression based on the idea of optimally combining well-known lossless compression algorithms. The file to be compressed was divided into chunks of a predefined length, and GP was asked to find the best possible compression algorithm for each chunk in such a way as to minimise the total length of the compressed file. The compression algorithms available to GP-ZIP included arithmetic coding, Lempel-Ziv-Welch, unbounded prediction by partial matching, and run length encoding among others. Experimentation showed that when the file to be compressed is composed of heterogeneous data fragments (as it is the case, for example, in archive files), GP-zip is capable of achieving compression ratios that are significantly superior to those obtained with other compression algorithms.