a simple gene

April 23, 2014

I had a goodly and productive day today, by which I mean that I:

  • Had a thing I wanted to do
  • Did it.

How rewarding! The “thing” was:

  • The fizzbuzz of genetic algorithms.

There were a lot of intermediate steps, of course, like these:

  • talk to Alex about it a bunch
  • scratch head
  • talk to Alex some more

and these:

  • fail.
  • fail.
  • notice I’m using completely wrong function for thing; fail.
  • succeed.

All in all, it struck a balance.

Now, I don’t mean I wrote fizzbuzz as a genetic algorithm (and I’m going to space here for a second to avoid nerd sniping myself: lalala….) but rather asked the question- what is the simplest implementation that I can think of of a genetic algorithm, which qualifies as such?

Here is what you need to make it work:

  • A defined type of “gene,” that is organized into
  • “generations,” or sets of genes of a constant size.
  • A metric for evaluating the fitness of a given gene, and
  • A way to change genes slightly, whether through mutation, recombination, or some other method.

I’ll come back to those momentarily, but Alex suggested this as a good first go:

A "gene" is a string of n length of bits (1 or 0).
They can be randomized or seeded with a starting gene (such as all 0's)

A "generation" is 100 genes.
The fitness of a gene is denoted by the number of 1's in the string.

Write a mutative algorithm that moves towards the base case of all 1's.

Let’s start with generating the basic material: one gene! This solution is in Clojure:

(defn new-gene []
   ; (repeatedly 50 #(rand-int 2)))
   (take 50 (repeat 0)))

I’ve given a length of 50. The commented out line would generate random bits, and the line at the bottom would generate a seq of all 0’s. Now to dump those into our first generation:

(defn generation []
   (repeatedly 100 new-gene)))

In English: “Take the first one hundred elements in the sequence generated by repeatedly calling the new gene function.” This gives us a list of 100 genes of 50 bits each.

Now we need to evaluate each gene for fitness; remember that our metric is simply “the more 1’s the better.”

(defn fitness [gene]
  (count (filter #(= 1 %) gene)))

Matt helpfully points out that in this case, the above function could be rewritten as:

(defn fitness [gene]
  (apply + gene))

Which achieves the same result by summing up all of the values in the gene instead of just counting the 1’s.

And it is straightforward to map this function over all the genes in our generation:

(defn evaluate-generation [generation]

Hmmm… good, but not exactly what we want just yet. We now have a list of the fitness of each gene, but this information does us no good in a vacuum… let’s try reordering the original generation by fitness, instead. By default, sort-by goes least to greatest, so I will also reverse that output.

(defn evaluate-generation [generation]

Or, even better:

(defn evaluate-generation [generation]
  (sort-by fitness > generation))

Now we’re getting somewhere.

Since we have an ordered-by-fitness list of genes, we want to cull away the least fit and allow the most fit to procreate forward. A good way to accomplish this is to simply discard the bottom half, and since this is a lazy sequence we can take just the first half of it to achieve the desired effect:

(defn evaluate-generation [generation]
    (/ (count generation) 2)
    (sort-by fitness > generation)))

Once again, English: “Take the top half of the generation of genes sorted by fitness.”

Now comes the fun part! We need to introduce some type of variation in the proceeding generation. There are plenty of ways to do this, but I want to start with the most fundamental: introducing random mutations.

To start- a function to perform that action on a single bit:

(defn mutate [g]
  (if (= (rand-int 100) 1)
      (bit-flip g 0)

Paraphrased: “If the result of calling a random number between 1 and 100 is 1, then “bit-flip” the input. If it is a 1, it becomes a 0. If it is a 0, it becomes a 1.”

Now to map that function over an entire gene:

(fn [gene]
    (fn [g] mutate g)

I have made these functions anonymous because I am going to go ahead and skip a step by mapping the above function over an entire generation all at once, like this.

(defn mutate-gen [generation]
   #(map mutate %)

This will output a new list of the same length of its input, each of whose bit’s will have had a 1100 chance of being flipped.

To get a new generation, therefore, the whole instruction set looks like this:

Start from somewhere (in our case, all zeros.)
evaluate all genes for fitness and discard the bottom scoring half.
replace the discarded half with a mutated version of the top half,
where each bit in the top half had a 1/100 chance of flipping.
repeat until the "objective" has been reached.

Alex points out that under most use cases for this type of algorithm, the “objective” is not stricly defined. It is always, in essence, to “maximize fitness.” Because the fitness of these genes is evaluated on such a simple metric, the objective is obvious (MOAR ONES!).

Here is the function that strings all of these together and returns a new generation, mutations and all:

(defn tng [generation]
  (let [ng (evaluate-generation generation)]
    (concat ng
            (mutate-gen ng))))

And here is the last function, which recurs until a perfect output of all 1’s is generated:

(defn go [generation]
  (if (= (fitness (first (evaluate-generation generation))) 50)
    (println (first (evaluate-generation generation)))
      (println (first (evaluate-generation generation)))
      (recur (tng generation)))))

Notice I had a debugging line in there to print out the fitness of each gene.

That’s that! If you are curious what the output looks like, you can see a round here. Each line is the top scoring gene of that generation. Computation time varies, but it generally takes about the same number of generations to reach the objective.

Github repo here; Clojure advice to optimize for readability and idiom is greatly welcome.