The Non-Computer Science behind Genetic Algorithms

The Non-Computer Science behind Genetic Algorithms

Validating the joy of simulating natural selection and genetics on my computer with tediously explained science.

Image Credit: DALL·E’s attempt at “‘Darwin’s tree of life’ made of dna as a sketch on yellowing paper,” why not? I actually got to see the original on display at the ROM, ages ago. Goosebumps!

This article can function as a companion to Tim John’s work, “Build your own Genetic Algorithm”.

Read this before Tim’s article if you’d like to prime yourself on natural selection and genetics: hopefully, it will enrich the impact of the metaphors adopted on the programming-end. Or, read this article after Tim’s article if you’d like to see how good Tim’s choices of terms were, and how much meaning they embodied. Or, just read this article if you’d like a refresher on natural selection and genetics. Do what you want! I’m not your dad.1

Context

As a young student, I dreaded math questions that were categorized as “problem-solving”. Later as a teacher, I was vindicated somewhat to learn that even among students for whom arithmetic is a strength (not me), word-problems and more open-ended questions are frequent stumbling blocks: re-conceptualizing a paragraph of information into a recognizably-formed equation (or as an apparently arbitrary yet uniquely correct series of recognizably-formed equations) requires skills of interpretation, judgement, and strategic familiarity that are not necessary for other practice questions. It’s also tricky to gently transition from those innocent-looking equations to said paragraphs of doom, so the experience of encountering one often felt jarring and discouraging in the way that riding over the top of a cliff and then smashing into the face of an opposing cliff before then falling fully to my doom between both cliffs …could feel jarring and discouraging.

Learning to program tricked me into loving problem-solving. Programming, from my neophyte perspective, was about envisioning a desired thing to exist, and then making it exist; moving from point A to B was about solving the problem of my desired thing not existing yet, and the payoff, in addition to the existence of my desired thing, was the sensation of wizardry.

Addictive. Frustrating. Usually fun. Often hard. And, for some types of problems, inescapably sub-optimal. Being a neophyte programmer meant that as I increased my literacy of code, I could see, and try to learn from, other people’s more expert solutions: i.e., those embodying more circumspect and insightful re-conceptualizations of my given problem-space. My best self2 could marvel that so many solutions, so many avenues to success, could exist, and were often far more elegant than anything I had imagined. Those “endless forms most beautiful and most wonderful” were reminiscent of studying natural selection in university: a space boasting multiple viable approaches to success and a resultant panoply of varied and fascinating artifacts of that striving.

It was therefore a joy to understand that computers could simulate that process in general (of course they could), that they could do so to reach and optimize solutions to arbitrary, possibly silly problems (cf. MarI/O), and that the solutions achieved could involve approaches not dreamt of in my philosophy. The pieces fell together for me when Tim shared AI-Junkie’s description of the Genetic Algorithm. Tim recently wrote an extensive article to modernize AI-Junkie’s work and put his own spin on it, and very kindly invited me to gush my amateur evolution and genetics knowledge as part of the introduction: while seeking to make the coding challenge more inviting, he also wanted to foreground the fun of programming a simulation of the natural world, so being as faithful to the associated disciplines’ terminology as possible was a priority. …But, as we worked, and the two halves of the article inevitably expanded, it became clear that joining them would leave potential readers (and especially students) with so much material that the cognitive load of keeping it all straight from start to finish would undermine the enhanced clarity and accessibility he was trying to achieve in the first place.3 So, after ensuring that sufficient science-context could be gleaned from his article to satisfy students with more of a priority on programming, we’ve situated my contribution as a sort of “further reading” for interdisciplinary nerds.

Hello, interdisciplinary nerds. I love you.

Approaching the Genetic Algorithm

Very broadly, genetic algorithms attempt to mimic principles of natural selection and genetics to build optimal solutions to problems. They involve generating lots of randomized approaches, evaluating those approaches for their degrees of success, and then preferentially selecting and combining the best attempts (while discarding the worst) to see if the resulting subsequent batch contains solutions that are even better. Rinse and repeat, sprinkling in degrees of randomness and predictability to taste, until a successful solution, or a solution with a desired degree of efficiency, is achieved.

Because this strategy of solution-building is inspired by—and heavily adopts the metaphors of—academic fields outside of computer science, having a baseline understanding of those fields (and their terminology) should make trying to implement your own genetic algorithms much more intuitive. So, here you go!

Natural Selection

While I’m a dilettante about this stuff, my wife, Carolyn Piccinin, actually maintains professional mastery of the domain as a genetic counsellor. In an better world, she’d be the author of the article you’re reading, but as it stands, we’re all fortunate that she at least went over the text and helped zap my most egregious errors.

(She stridently believes the systematic analogies I use to help with the terminology below would’ve been stronger and clearer if they involved recipe books rather than encyclopaedias, but allows they’re not technically wrong. As a GC and a mighty cook, she’s in a doubly-good position to know. Sorry.)

Charles Darwin and Alfred Russel Wallace described the process of “Evolution by Natural Selection” as a way to make sense of both the stunning variation across organisms seen in the natural world, as well as the subtle but unavoidable patterns that exist between them. Evolution (i.e., change over time) of species had been theorized before Darwin’s time, but lacked a plausible (observable, eventually testable) engine or mechanism. Natural Selection was the proposed mechanism, and demonstrated that the abundant variety of life we observe would emerge directly and necessarily from the interaction between a few simple, acknowledged principles:

  • Heredity: Offspring tend to resemble some combination of their parents (and diminishingly, their more distant ancestors).
  • Mutation: Novel or modified characteristics occasionally show up that do not appear to have belonged to an ancestor, but can still be passed on to offspring like any other trait.
  • Competition: Because resources are limited and typically insufficient to fully satisfy all members of a given population, not all members will reproduce equally—some will thrive and have lots of offspring while others will be less successful by comparison (due to death, inability to find a mate, etc.).
    • Competitive advantage as described by an interaction of an organism’s traits with its environment is often called fitness (as a classic example, if a giraffe having a longer neck means access to more leaves on the local tall trees than fellow giraffes, and better access to leaves is demonstrated to represent a lower likelihood of dying before it reproduces, the giraffe could be said to be “more fit”).

Accepting these principles as plausible features of nature, we can anticipate some overarching consequences by exploring interactions between them:

  • Heredity with Mutation suggests that a population has a capacity for change beyond just an endless recombination of preexisting traits.
  • Heredity with Competition suggests that the winners—those able to become parents before they die—will create a new generation that tends to more resemble themselves (specifically, that has a higher proportion of their traits). Conversely, traits characteristic of the losers—those who died before they reproduced—will fade from the population through subsequent generations.
  • Mutation with Competition suggests that occasionally a novel trait will arise that confers a competitive advantage (e.g., the organism that has that new trait is better able to survive and attract a mate to reproduce because of it). These traits will then enter and proliferate across the population through subsequent generations.

That’s it! In broad strokes, that engine—the consistent interaction of heredity, mutation, and competition—is the explanation for how the traits of populations change over time, or evolve. It’s called “Natural Selection” because the “decision” of which traits propagate in a population over time isn’t made by anyone or anything, but is a natural consequence of individual fitness.4

Genetics

Darwin and Wallace deduced Natural Selection from observable phenomena, and then worked to collect observations that would test, qualify, and ultimately reinforce that framework. Their theory proposed a mechanism for how changes in species could occur over time, and for how species might splinter as they spread out geographically and definitions of fitness could differ locally. But crucially, the mechanisms for heredity and mutation themselves—the field of study that would eventually become genetics—were unknown to them: roughly in parallel with Darwin, Augustinian monk Gregor Mendel was working to quantify and systematize observable patterns of heredity in peas.5 It wasn’t until 1944 that scientists identified the molecule DNA (deoxyribonucleic acid) as the physical medium to carry information from generation to generation67. The mechanics of cellular division, of the (imperfect) reproduction of DNA, of its eventual correspondence with observable, heritable traits emerging from molecular processes inside the cell—this work continues now with ever greater sophistication and specialization as technology and our frameworks of understanding improve.

While natural selection operates at the level of populations, genetics concerns itself with individual members of those populations; some of the foundational ideas are represented in the following terms (organized to work our way from the observable individual organism down to the molecular medium of its heredity):

  • Organism: a discrete entity made up of one or more cells, each of which typically contains an identical copy of a genome (described next).
    • Organisms with enough features in common to be able to mate and produce viable offspring (i.e., child organisms that can similarly mate and produce viable offspring) are said to be members of the same species.8
    • A group of organisms that are members of the same species and are sufficiently local to each other that they can interbreed and compete for the same resources can be said to be a population.
    • Grouping all currently-breeding parents (or the offspring of all currently-breeding parents) within a population into a cohort is the process of describing a generation. Generations in nature often have fuzzy boundaries, but can be conceptually useful as units for tracking the evolution of traits over time.
  • Genome: the complete set of data that represents an organism’s heritable features. In humans, this would be encoded in the full set of an individual’s DNA.
    • Analogy: An individual’s genome is like the information content in a complete encyclopaedia.
    • Identical twins have identical genomes.
  • Chromosome: a subdivision of an individual’s genome into a discrete package. (Humans typically have their genomes packaged into two sets of 23 chromosomes in any given cell)
    • Analogy: An individual’s chromosomes are like the volumes of an encyclopaedia.
  • Gene: a specific region of DNA that represents the instructions for building a particular protein (and ultimately contributes to the expression of a particular observable trait).
    • Analogy: An individual’s genes are like the individual articles in an encyclopaedia.
    • Genes can be viewed as the unit of heredity: when parents produce offspring through sexual reproduction, those offspring contain a combination of DNA (and ultimately, fitness-impacting genes) drawn from the parents. If the offspring eventually become parents themselves, those same genes will have a chance to travel to the subsequent generation, and so on.9
  • Allele: a particular version of a given gene. The trait ultimately expressed is determined by which version—which allele—is present.
    • Analogy: The alleles present that govern an individual’s actual eye colour are like different versions of the same article in different encyclopaedias—they may cover the same topic, but with differences in focus, interpretation, etc.10
  • Nucleotide: one of a finite set of molecules that can be linked together to form DNA. The sequence in which these molecules occur is the data which is copied and passed on in part to offspring, and could therefore be considered the fundamental matter of heredity.
    • Analogy: The nucleotides linked together into an organism’s DNA are like the individual letters in the words in the sentences in the articles in the volumes that make up a complete encyclopaedia.

When discussing an organism’s traits in genetic terms, it’s helpful to know whether we’re working at the scale of the generally observable, or at the scale of the traits’ molecular underpinnings. Two more terms are good to have handy, to this end:

  • Phenotype: an observable, heritable trait of an individual organism.
    • Eye colour is an example of phenotype; which languages are spoken by the organism is not an example of phenotype11.
  • Genotype: the specific genes possessed by an individual that are responsible for the expression of an observable, heritable trait.
    • The presence of specific alleles of eye-colour-coding genes in an individual’s genome is its eye-colour genotype; these will ultimately interact to result in the phenotype of having an associated eye colour.

Onward, to the Genetic Algorithm

The next step will be to see how these concepts map onto genetic algorithms in programming. Have fun!

  1. Taran, you can read the articles in whatever order you’d like, too. I’m a cool dad. I’ll allow it. ↩︎

  2. i.e., when avoiding the bruised ego and self-doubt engendered by witnessing superior execution again and again and again… ↩︎

  3. cf. the foregoing sentence. ↩︎

  4. Conversely, artificial selection occurs when populations are bred with intention to encourage or discourage particular traits (and therefore, “fitness” is externally, arbitrarily dictated). Dog breeds and the modern forms of the fruits and vegetables we eat are classic examples of artificial selection. (Incidentally, artificial selection in humans is called eugenics, and is an endlessly fascinating ethical tire-fire.) ↩︎

  5. Darwin and Mendel don’t appear to have met in person, and while Mendel seems to have sent Darwin his work on heredity in translation, it’s unclear that Darwin was aware of it during his lifetime. Facilitating such a meeting is definitely on my time-travelling bucket-list. ↩︎

  6. This is a massive oversimplification of what goes on mechanically.. It’s like saying “Wikipedia is the website that tracks what people think it’s important to publicly write about.” ↩︎

  7. Watson, Crick, and (increasingly, righteously) Franklin are names that are often associated with the “discovery” of DNA; their contributions in the 1950s have become so overwhelmingly famous because they described the physical structure of the molecule, revealing the elegant mechanism of how it could be replicated despite its incredible complexity. ↩︎

  8. Species in nature are not as distinct as they’re often presented and described, since “viability” of offspring can be a matter of degree (excluding cases like mules—horse/donkey hybrids that are born sterile), and can be subject to geographical boundaries. One of the coolest examples I’ve encountered is the idea of ring speciation: imagine a migrating population that arrives at an impassible barrier like a lake or a mountain, and begins to spread around it. Over multiple generations, local portions of that larger population will be subject to different selection pressures, resulting in local variations building up (and associated, accumulating genetic differences when comparing parts of the population that went one way when it met the barrier, vs. the other). If the expanding population meets up again on the other side of the barrier (i.e., “closing the ring”), it’s possible that members of the two sides will have built up enough differences that they’ll no longer be genetically compatible with one another—they’ll be different species, by this definition. Yet, if you were to take sample organisms at smaller geographic intervals, travelling back around that ring from one end to the other, they would be able to interbreed. Speciation as a gradient! I love it. ↩︎

  9. Richard Dawkins, before his association with atheism, arguably became a household name for his book The Selfish Gene, wherein he explored a fascinating extension of this process and imagined the machinery of organisms—cells, blood, eyes, locomotion, intelligence, tentacles, etc.—as mere vehicles for individual genes to improve their chances of propagation. It was a fun read, and pre-Creationist-beleaguered Dawkins had a spark of eager excitement that came out in his footnotes especially, that I think the vagaries of the world eventually ground away. Alas. ↩︎

  10. By neat extension, different encyclopaedias may be more or less successful in different settings due to these choices… ↩︎

  11. Capacity to speak a language is largely an example of a phenotype (i.e., no cactus is likely to ever speak Urdu no matter how much expert tutelage it has access to, while humans do it all the time). ↩︎