The exon mutation problem in evolution revisited

I will still go very carefully through this problem to verify if there indeed is a problem.

The standard figure given for the average mutation rate in all life forms on the Earth is λ=0.5*10-9 mutations per base pair per year. The mutation rate is not constant, it varies between base pairs and between different phylas and so on, but for calculating evolution in the large scale, this number is quite fine. The average mutation rate can be found from many sources including the web.

There are 8.7 million species in the Earth, some 1-2 million of them are animals. A reasonable average size of a genome is 10,000 genes and each gene is about 1000 base pairs (bp). A gene contains sections of control parts (introns) and protein-coding parts (exons). A reasonable average is that there are 8-10 exons in a gene and an exon is about 100 bp.

In a human cell there are 20,000 genes, each has about 10 exons, but it is reasonable to estimate that a human cell produces some 100,000-65,000 different proteins. As one exon codes one protein, we can say that there are at least 65,000 different exons in a human cell  (or 200,000 if you count 10*20,000 genes, which is also reasonable). There are four nucleotides, so the base is 4 in the mathematical sense. Eight mutations creates 48=65,000 alternatives. We can conclude that to get the different proteins in a human cell, the exons must have at least 8 mutations (from the hypothetical first cell). As the mutations are spread over the whole exon of 100 bp, two randomly chosen exons in a human cell differ (in average, and at least) by 8+0.92*8=15.3 mutations. This 0.92 is the overlap that some mutations in two exons are in the same 8% of the exon.

We can also calculate from the mutation rate that in 600 million years there is about one mutation in 3.33 base pairs. That is, mλT=3.33*0.5*10-9 *6*108=1 when T=6*108 years, m=3.33 and λ=0.5*10-9 mutations per base pair per year. This means that in 600 million years there are about 30 mutation in each exon of 100 bp. Mutations in randomly chosen exons overlap by 30%, so two exons differ in average by 30+0.7*30=50 mutations.

From these two estimates we can conclude that in humans, and other mammals, exons differ by 16-50 mutations in average.

How realistic is it to get 14 new orders of mammals in 10-15 million years in the beginning of the Cenozoic era?

If the mutations are random the arrival process is Poisson, but we can solve the problem also with basic combinatorics. We can consider an exon as a system with 100 independent base pairs. The probability of having exactly k mutations in 100 base pairs is just the familiar binomial probability:

Prob=binomial (100 over k) * pk(1-p)(100-k)

where p= λT is the probability of one mutation in a base pair in the measurement time T. The maximum is attained when p=k/100. Setting T=20 million years, we get

p=0.5*10-9*2*107=0.01.

This means that the expected number of mutations in the whole exon of 100 bp is k=100p=1. If we want to have 16-50 mutations in an exon in this time T, it gets rather improbable. Setting k=16 we get

Prob≈10-13.7

This probability is quite small as in order to have one such exon the population size must be 100,000 billions. Using T=15 million years the problem becomes much worse. Clearly, in 15 million years a new exon that differs from the existing ones by 16 mutations cannot be created by random mutations with the assumed mutation rate.

Selection cannot solve this problem because if an exon gets a few mutations it does not work and the gene becomes a pseudogene. In order not to lose the original function of the gene, the exon must have become duplicated and mutations change one of the copies. Selection cannot work on pseudogenes.

There are a number of possible solutions. The first possible solution is that the mutation rate was much higher, but it would have had to be almost 100 times higher. It is very possible that during the extinctions that preceded the creation events the mutation rate was higher. These extinctions seem to coincide with the times when the sun passed through the spirals of the Milky Way. During those times there were more meteors hitting the Earth, but it is not impossible to think that the cosmic radiation was higher. Yet, this cannot be the solution. In order to make it reasonably probable that one exon can have 16 or more mutations, the rate must be increased to a level where all exons will have almost 15 mutations. Though this might create some new useful exons, it would destroy all other genes. It would kill all life on the Earth.

Secondly, it may be that the new exon needs much fewer mutations than 16. Different orders of mammals appeared in 10-15 million years according to fossil data, thus the animals from there orders were different already in this time. Naturally proteins, and exons that code them, are different in different species. For instance, humans and chimpanzees have 80% of proteins different, but mostly the differences are very small. Do they have any clearly different proteins? Mammalian species are mostly quite similar. Mammalian blood is quite similar in different species. All produce milk with a variant of the same genes and all mammals grow hair, not feathers or scales. But if one looks carefully there are some differences. Cats have a protein called Fel d 1. A similar protein is also present in the venom of slow loris. Slow loris are primates, cats are carnivours. Carnivours and primates are two mammalian orders which developed in the beginning of the Cetazoic era. As Fel d 1 is a clearly different protein, it requires a clearly different exon, which apparently appeared in two orders in the beginning of the Cetazoic era. Fel d 1 is 50% similar with mouse ABP protein and thought to have the same evolutionary origin, see:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5957422/

One example is enough to show that there is at least one clearly different exon. Fel d 1 may be such an example: if Fel d 1 is only 50% similar with the closest related protein, then we may assume that the exons are also only 50% similar. That is a much bigger difference than the 16% difference that was calculated to be impossible in the time frame of 10-15 million years. Cats did not develop immediately at the beginning of Cetozoic, but setting T=60 million years and k=50 to the binomial formula yields a very small probability Prob.

It seems to me that the second solution is not correct: cats got the exon for Fel d 1 in some way and it is indeed a clearly different exon. It developed from the same proto-exon as mouse ABP and in this way evolution is correct, but it could not have developed with random mutations in the given time frame.

The third solution is that new protein coding exons were inserted to the genome of new mammalian orders by viruses. It has been known for some time that retroviral insertions have changed human genome:

https://www.ncbi.nlm.nih.gov/pubmed/14666532

This possibility may always be there, but only the extinction of the previous species gave the new, and presumably worse, species a chance to develop better. Viruses or bacteria may have developed exons in pseudocode by random mutations. They had the time for it and potentially large populations, though the mentioned figure, 8.7 million species, covers also these forms of life.

The last solution is that exons stay all the time functional while they mutate from one form to another very different form. This is the original idea in Darwinism: evolution proceeds with small steps and natural selection guides the steps. This solution is very difficult to accept: if means that a protein is useful for one task in the beginning and another task in the end and in every intermediate step the mutated protein is useful for something. What further makes this solution highly improbable is that genes are known to mutate to very different genes by first getting duplicated and the duplicate mutating, most probably without being functional in each step. This is for instance the way the gene producing Alpha-lactalbumin is assumed to have developed from c-lysozyme, see e.g. the Wikipedia entry

https://en.wikipedia.org/wiki/Alpha-lactalbumin

For these reasons I do not consider a gradual change where exons stay functional at each step and the exon gets many (say 16 or more) mutations at all possible.

Possibly the third solution is the least improbable, but all of these solutions seem to depend on very much good luck. There are two problems: how to get enough mutations in the given time and how to get mutations that are useful for some purpose. There can be solutions to the first question, but to the second one there does not seem to be any. Darwin proposed natural selection as the solution to the second problem, but it cannot work on pseudogenes.

The problem remains. The evolution theory needs something else than random mutations, or am I still mixed up? No, could not be. These calculations are so simple. There is a problem. Maybe some researcher will invent a solution.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.