How can new protein-coding genes be created by mutations?

I ended up to a problem in trying to understand how evolution by mutations works and decided to look at it deeper. My problem is that in a gene there are protein-coding parts (exons) and non-coding parts (introns). There is more non-coding DNA than introns. Some it is between genes (intragenic) and some is controlling other genes by creating proteins called transcription factors that can turn genes on and off. As a very rough description of how a gene works, there is the control part (in introns and outside the gene) and the protein-coding part.

It seems to be so that a recessive allele has some function of the mother allele turned off. It is recessive because if the other copy of the gene is the original allele, the gene still performs this function because of the other allele. A dominant allele does more than the mother allele. It is dominant since even if the other allele does not do this new function, it is done by the dominant allele. Normally an allele is dominant because a mutation has turned on such a function that was earlier turned off.

You see now my problem: a recessive allele has something existing turned off while a dominant allele usually has some existing function turned on what earlier was off. How do we get totally new functions? These are simply turning on and off parts of genes by mutations influencing the control part.

At some point the protein-coding parts appeared, evolution could not be turning on and off existing protein-coding parts. How can a protein-coding part be created by mutations? I thought that the most natural way is that some existing gene (let’s ignore the very beginning and assume there was at least one gene) gets multiplied, but in a bit faulty way so that it is ineffective. This ineffective gene can accumulate mutations over a long time as it does not do anything. Finally it get changed to a working gene and gets activated. But if it is a protein-coding sequence (exon), to get it turned to a functioning gene seemed to me like having a monkey write a symphony. That is, an exon has maybe 120 base pairs. In one generation humans have obtained 3 mutations in average. If you precisely know what you are doing, then you can change 120 base pairs to a working cod in 40 generation, but that assumes you are a designer.

Mutations are random, thus we would have to know how many possibilities there are in 120 base pairs and how many of these possibilities are good for humans. Humans currently have in their DNA 233,785 exons, but many have several alleles, thus let us estimate that about one million. Humans have not found all good ones, thus I estimate that maybe 100 million are good codes. This estimate of 120 base pairs for the length of an exon is fine. Most exons are shorter than 200 base pairs and the average size is 386 base pairs (a gene has in the average 8.8 exons and 3,400 protein-coding base pairs). There are more than two alternatives for a base pair, but even calculating with only two alternatives 120 base pairs gives 2 to the power 120 possibilities. It is about 10 to the power 36, while if there are 100 million good codes, it is only 10 to the power 8. The chance of random mutations to find any of these good possibilities is 10 to the power -28. That is, no chance at all with 3 mutations in a generation.

This was my problem in evolution, so I looked at human evolution, notably the evolution of the human brain. Humans have a gene family NOTCH2NL, which maybe explains out larger brains. Monkeys and even orangutans do not have it: they only have the ancestral gene NOTCH2. Gorillas and chimpanzees have an ineffective form of NOTCH2NL and only humans have this gene in a working version, but Neanderthals also had it. It seems to have developed from the ancestral NOTCH2 3-4 million years ago in exactly the way I though new protein-coding genes could arise. First the NOTCH2 become multiplied, but the copy was only partial and did not work. In gorillas and chimpanzees it is in an ineffective form, but in humans it got corrected by a mutation and now it works well.

But it is not a new protein-coding exon. This new gene seems to work by a mutated control part turning off a function. The new NOTCH2NL gene did not start separation of one type of cells as it did in the ancestral version and therefore the gene continued producing neurons and we got bigger brains. This is still turning existing exons on and off.  See this gene in the article:

https://www.sciencedaily.com/releases/2018/05/180531114639.htm

The measurement that the human mutation rate is 1 mutation per 30 million base pairs per generation is from

https://www.nature.com/news/2009/090827/full/news.2009.864.html

There are 88.4 million bases of protein-coding DNA. It means 3 protein-coding mutations per generation. 3.5 million years is some 120,000 generations. Consequently there should have been 360,000 mutations in the last 3.5 million years. This is a measurement of how many mutations have accumulated to human DNA.

The mutation rate of course depends on the effective population size. It seems there were very few humans. The effective population size has been estimated as 18,500 breeding individuals 1.2 million years ago

http://www.sciencemag.org/news/2010/01/human-ancestors-were-endangered-species

and even lower, to 7000-14,000 for the time around the Out-of-Africa excursion

http://www.pnas.org/content/109/44/17758

Each human probably has some 60 mutations in their DNA and 1% of them is in protein-coding DNA. That is, 0.6 mutations for each person. Three protein-coding mutations have been passed to the next generation according to the study of human DNA. It means that one mutation per 0.6/3*effective-population-size has been accepted to the population. Some of these mutations have been good, while many are neutral and some cause diseases. Deleterious mutations should be purged out of the population in some time, so calculating over the long time of 3.5 million years we should get the number of mutations that are either good or neutral.

For exons in humans I used the references:

https://www.ncbi.nlm.nih.gov/pubmed/15217358

https://link.springer.com/referenceworkentry/10.1007%2F978-1-4020-6754-9_5694

In the latter one the average exon size is humans is 3,400. This is not the average length of one one exon (which is 3,400/8.8=386) but the total amount of protein-coding DNA in a gene.

Multiplying this with the number of human genes (c. 26,000) gives 88.4 million bases of protein-coding DNA in a human. Of course, these numbers are estimates, but good enough.

As I did not get to the solution of the problem how mutations can create new protein-coding parts, I looked at a worm DNA. Worms, after all, seem more simple than humans. It turns out that they are not much simpler.

The worm C. elegans has 22,227 protein-coding genes (humans have c. 26,000), and 561 annotated pseudogenes, but the number can be larger. Exons are coding parts in a gene, average size 123 bases (humans 386), 6.4 exons in a gene (humans 8.8). Introns are between exons, non-coding parts, average size 47.  The size of the worm gene size is about 3000 base pairs (10,000-15,000 in humans). Exons are 26% of a gene (similar in a human).

http://www.wormbook.org/chapters/www_overviewgenestructure/overviewgenestructure.html

So, that was a worm. It is quite similar to a human, though we have more non-coding DNA which is not in genes. I do not think I can find the solution to the problem how mutations can create new exons from worms. Well, before it is solved there is a serious gap in the evolution theory. It is so strange that the researchers of that field have not filled the gaps. A solution with gaps would not pass in mathematics. I will keep on looking.

I put this post to the religion category since the evolution theory starts to look more and more like a religion, not science. I mean, you are supposed to have worked out the gaps before you propose a solution to a problem.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.