Genetic Algorithms
I decided to write a couple of posts about a topic I have been working on a couple of years ago and I still find very interesting: Genetic Algorithms.
Genetic Algorithms are very general methods, based on the same mechanisms the biological theory of evolution is built upon, that can be successfully applied to solve optimization or search problems, without having to resort to a well known solution or a fully described algorithm.
Very generally speaking, genetic algorithms can be seen as a form of heuristic that approximates a possible solution for a problem by improving it step by step. Evolution in biology depicts how a population of simple organisms eventually can evolve in functioning complex life forms, whereas genetic algorithms apply the same incremental approach to the solution of mathematically defined problems.
Genetic Algorithms can also be made to mimic intelligent behavior and can be seen as a form of Artificial Intelligence. Even if the algorithm's process is basically "dumb" (and blind), its emergent behavior in response to changes in input can be easily mistaken as "awareness" by a human user.
How do Genetic Algorithms work?
Given a large population of organisms from the same species, evolution tells us how this population adapts to its environment thanks to small progressive changes in the genetic material (the genome) of each organism. Because of the mechanism commonly called "natural selection", organisms that have better characteristics will have a higher chance to breed (they are "fitter" than competing individuals) and therefore their genetic material will be passed on to their descendants. After many generations, "better" characteristics become more common, while genes that cause disadvantages slowly disappear. Anyway, even with this crude simplification of how evolution works (there's much more to it, just check out Wikipedia), the mechanism can be easily adapted to a computer program.
A population of solutions
Genetic Algorithms work by assuming that a large collection (a population) of candidate solutions for a certain problem can be generated and represented. The representation of those solutions can be chosen in several ways. For instance, suppose we are trying to find the maximum point of a two-dimensional function: in this case, each point expressed in bidimensional coordinates is a candidate solution and can be simply represented as a vector of 2 real values (the x and y values).

The initial population of solutions will be randomly generated and therefore all those "candidate points" will be scattered around on the 2d plane. Good solutions will be selected at each generation, thus generating progressively better solutions, until one or more individuals will finally be close enough to the maximum point (0.0, 0.0).
Fitness
Once the format used to describe the solutions has been defined (the two-values vector in the example above), a way to determine the "optimality" of each solution must be found. This is usually called the fitness function, that takes a solution as input and returns a value that tells how close it is to the best solution. The range of fitness values and how they are computed varies from algorithm to algorithm and from problem to problem. Anyway, in our example we could simply evaluate the function using the candidate's coordinates. Since we are searching the function's maximum point, a higher value always means that the solution is "fitter" than other ones with lower values.
Fitness functions can be defined in any way you'd like. They will radically impact how the genetic algorithm works and what solutions will be returned. Ideally a fitness function should be computed very quickly, but in some applications you can use a human user to assess the "goodness" of each solution: the evolutionary process will be very slow, but the mechanisms will work nonetheless.
Breeding new generations
At each iteration of the algorithm you'll have a complete population of many candidate solutions and an evaluation of their fitness. In order to generate the next generation, solutions must be able to "breed" new ones. This is usually done by first selecting two candidates and then letting them produce two new "offspring solutions", until a complete new generation has been breeded. The selection process can be done in many different ways, but one of the most common is the "roulette wheel selection":

In the picture above each individual solution has a probability of being picked, proportional to its fitness value. Fitter solutions will be picked more often than worse ones. Other schemes include "tournament" selection, or roulette wheels tweaked to certain degrees (for instance introducing "elitism").
The offspring solutions are created with two mechanisms also taken from how chromosomes change their DNA in biology: crossover and mutation.

When "crossing over", the genes of two solutions mix up and create a new solution that takes some parts from the each parent. Usually a simple crossing point is chosen randomly and the genes are copied in from both parents, but more complex schemes are possible with multiple crossing points.
When a solution "mutates", a random gene is changed to another random value. Mutations happen with a very low frequency and, as in real life, are usually "destructive" for the individual (that is, its fitness value usually decreases). However, since the crossover mechanism only mixes up genes from existing solutions, mutations are needed to create diversity and to inject new solutions into the population.

As seen in the picture above, without mutation the solution pool could wrongly converge to a "local optimal solution" (the yellow dots). A successful mutation can create completely random solutions, leading to "unexplored" solutions that cannot be reached only via crossover once the population has converged.
When am I done?
This process described above repeats, generation after generation, until you reach an acceptable solution. You can use a threshold fitness value or a minimum number of generations to determine when to stop your algorithm. The best solution in the population can then be used as the solution of your original problem. However, Genetic Algorithms are heuristics and not always return optimal results: your "best" solution may be sub-optimal, plain wrong, or worse... But the good thing about these algorithms is their rapid adaptation to input parameters and that, even with a limited number of generations, they are able to find an "acceptable" solution in most scenarios.
Using Genetic Algorithms
Genetic Algorithms can be applied to a wide range of problems. The skeleton of the algorithm is a simple, very generic, iterative method: you can freely define and tweak the definition of fitness function and the stopping condition to suit your needs.
Besides classical applications (like the knapsack problem mentioned on Wikipedia), Genetic Algorithms have been applied successfully to many interesting fields, like aerodynamics studies in wind tunnels, the design of electromagnetic antennas and so on.
AI for videogames is another interesting field where genetic algorithms aren't commonly used yet, perhaps because they are deemed to be too unreliable and slow to work in a real-time constrained system. Roughly one year ago I have been working on a sample strategic game demo that did indeed challenge a human player using a Genetic Algorithm, and it did work quite well. More about that in a future post...




Good review... Make me remember great time at S.T.I.