Understanding the informational content of Evolutionary processes

Evolution is a process by which organisms change in response to environmental characteristics.

Each organism has a fingerprint, shaped by a reward system defined by natural selection.

Fitness can be assessed as the number of grandchildren for each individuals.

In this context, we can think of fitness as a function of the DNA sequence and environment.

\((DNA, Env)\rightarrow(fitness)\)

How does natural selection change the information content of the genome? If a specimen dies early in life, it will have less time to generate offsprings. To know how natural selection operates, we only need know which descendant survived to have children. This can be expressed with a binary value for each descendant, meaning the information content is at most one bit per offspring.

Like in nature, the reward system does not communicate information on the qualities leading to higher or lower amounts of children. For this reason, bits can be highly redundant over a generation, leading similarly unfit individuals to have fewer offsprings.

MacKay asks: 'how many bits per generation are acquired by the species as a whole by natural selection?'. Since, that is, Australopithecines and apes diverged, 4M years ago?

Assumptions:

  • 10 years generation time (400.000 generations)
  • population of \(10^9\) humans (each receiving 2 bits, one from each descendant)

The total number of bits of information responsible for the changes our species underwent since 4 M years ago are

(number of generations) ((\times\)) (bits per parent) ((\times\)) (starting population)

\(400.000 \times 2 \times 10^9 = 8\times 10^{14}\)

There is a but: since redundancies are present, the formula above over-represents the number of bits.

We continue with a crude, yet more accurate models following these assumptions:

  • Gene with two defects is more defective than a gene with one.
  • Amount of defective genes is inversely proportional to fitness of the organism.

Oversimplifying but let's work with this.

Model

\(N\) individuals with genome size of \(G\) bits.

Variation is:

  • Mutation
  • Recombination (copulation)
  • Truncation selects (\(N\) fittest children, who pass their genes to the next gen).

Each individual has genotype represented by a string \(x\) of \(G\) bits, each having state \(x_g=1\) or \(x_g=0\), corresponding to a binary measure of fitness. Fitness \(F(x)\) can be a sum of their bits: \(F(x)=\sum^{G}_{g=1}x_g\).

Each element of \(\textbf{x}\) is a nucleotide, the basic building block of nucleic acids (including DNA). Alternatively, one could chose alleles to be the main components of \(x\).

Having defined \(F(x)\), we are assuming fitness to be a linear function of the genome, implying a change in one nucleic acid would result in a small difference in fitness, regardless of reciprocal interactions. The normalised fitness can be: \(f(x) = F(x)/G\).

Models of Variation

Our goal is to compare diploid vs haploid reproduction. The first is based on mutation alone, whereas the second involves recombinations of two genomes.

Mutation

At each generation \(t\), every individual produces 2 children.

Mutations occur with iid probability $m$. This is the probability of a bitflip.

Recombination

Since humans are haploid, \(N\) individuals arranged in \(M=N/2\) couples. To keep the number steady over time, one can assume each couples procreates \(C=4\) children. Of these \(MC\) progeny, \(N\) are selected to continue the new generations.

We assume genotypes are independent given the parents'. Each child obtains its genotype \(z\) by random crossover of its parents' genotypes, \(x\) and \(y\): \(P(z_g = y_g) = P(z_g = x_g) = 1/2\).

Each generation has \(MC/N\) progeny who is then able to procreate.

(continues in the next blog post)