Where did the coronavirus Covid-19 originate?

There are some conspiracy theoreticians who claim that the present coronavirus may have escaped from one certain American bioweapon laboratory. I had to look at this, as the typical argument against these kind of claims is that the author is not an expert of RNA viruses. But as it is nowadays with web and all, you do not need to be an expert to find out things today, it is enough to find some papers by experts. I typed some keywords to Google and found some expert papers.

            See this Korean article from Feb. 2020:

            JM Kim et al, Identification of Coronavirus Isolated from a Patient in Korea with COVID-19

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7045880/

            Figure 3 shows the phylogenetic tree analysis of SARS-CoV-2. In the figure the BetaCoV tree branches from BetaCoV/Wuhan/WV07/2019. That is, all variants of this virus are in the tree that started in Wuhan. However, this paper is mainly interested in the virus variant in Korea. We can conclude that to Korea the virus came from Wuhan.

            But this is not the whole story. The following preprint has an analysis of all so far available 48 variants of the Covid-19 virus:

            T. Koyama et al, Variant analysis of COVID-19 genomes, preprint Feb. 2020-03-20 (the authors are from IBM, I will call it IBM preprint)

https://www.researchgate.net/publication/339461351_Variant_analysis_of_COVID-19_genomes

            The first figure shows a tree growing from a single ancestor MT007544. This ancestor was found from an Australian sample. The first branch is MT044258 (California). The next branch is MT039890 (Korean traveller to Wuhan, not connected with the Wuhan seafood market), then MN988713 (Chicago, Illinois), then two samples: MN994467 (California) and MT044257 (Illinois). After these comes a branch that includes two US samples MN994467 (California) and MT044257 (Illinois), and a separate subtree which includes MT049951 (Yunnan, China, not from the Wuhan seafood market), LC522973, LC522974, LC522975 (all three from Japan), MN975262 (Shenzen, China, not from the Wuhan seafood market), NMDC60013002-04 (Wuhan, not from the seafood market), MN997409 (Arizona), MN938384 (Shenzen, China, not from the seafood market), MN985325 (Washington, the first US case) and MT066175 (Taiwan).

            Later there comes the tree from the Wuhan seafood market, but we are interested only in the origins of the virus. The conclusion of Chinese that the origin of this pandemic is not the Wuhan seafood market is clearly correct. The claim that the origin of the virus is not China but the USA seems quite reasonable if the preprint is correct. If so, the virus may have escaped (or was stolen) from the USA and got to China in some way (maybe with the US military athletics who visited China in October 2019 and some of them were ill). So, based on this preprint the conspiracy theory could be possible. Then we would think as follows:

Who in the USA would have studied viruses of Chinese bats? The closest match to the genome of Covid-19 are bats in China, not bats or any other animals in the USA. Thus, somebody in the USA was studying bat viruses. If nobody from the civilian sector in the USA admits being guilty, then the natural suspect is the military. Of course, the military, any military of any country, does study bioweapons, but only some are far in this field: a country with deep knowledge on these viruses can also create vaccines for these viruses fast, so we find some obvious candidates rather easily.

But there is still a problem with the preprint from the IBM researchers. A paper from Australia gives a phylogenetic tree where the Australian sample MT007544 (that presumably is from one of the people returning to Sidney from Wuhan around 25. Jan 2020) is in a subtree and far from the root of the tree in China. See Figure 2. a in

G. Taiaroa et al, Direct RNA sequencing and early evolution of SARS-CoV-2

https://www.biorxiv.org/content/10.1101/2020.03.05.976167v1.full.pdf

Taiaroa et al must have the same data (see where Taiwan, Japan and US samples are), but their tree is different than the one in the preprint. So, I had to search further. In the paper

            T. Matsuda et al, Genome phylogenetic tree analyses revealed evidence that the severe acute respiratory syndrome coronavirus 2 had been introduced to Taiwan, the United States, and Japan several times

http://www.med-mec.com/data/20200219_2019nCov_arXiv.pdf

            there is Figure 2 with a short sequence from each 48 virus variants. We can very easily conclude that the IBM researchers’ preprint seems to be wrong. In this short sequence MN908947 Wuhan is exactly the same as MT044258 California. (Naturally, they would not be exactly the same if we take a longer RNA sequence). The preprint claims that from MT044258 developed MT039890 (Korea) and from MT039890 developed MN988713 (Illinois). But this would mean that there would be several back mutations, like G->T->G. There would (in my opinion) be too many of these back mutations, thus the phylogenetic tree of the preprint seems to be wrong. I conclude that while the conspiracy theory may be possible, it seems that it can be wrong: Wuhan may well be the origin of this coronavirus. This time there may not be any conspiracy. But we will see when and if the preprint gets published. If you want to do it yourself, it is not difficult. The complete RNA sequences were at least in German Wikipedia on the right side links. Just take them to a program and calculate distances between each pair. I think this is what was done in the preprint, but there is something more: you should limit the number of back mutations, they are not that probable. For my part, I will wait for some other researcher to do the phylogenetic tree and to get it accepted. So far it can be both ways, yet I am think that bat viruses did not just jump to humans. Three coronaviruses (SARS, MERS, Covid-19) in a rather short time. I suspect that some researcher studied RNA viruses and the virus escaped. One should not study everything possible. This is a killer virus.

Update: there is now a new preprint in researchgate (I have not read it yet, but check it):

https://www.researchgate.net/publication/338938615_Evolution_and_variation_of_2019-novel_coronavirus

I requested the preprint from the authors but have not got it yet. From the abstract it follows that the Wuhan seafood market was not the origin of this pandemy. It started some months earlier. The sample EPI_ISL_403928, that the authors find is from a different strain, is also from Wuhan, China (2019-12-30). The abstract at least does not point to the USA as the origin.

I suggest the following link and discussion there

http://virological.org/t/novel-2019-coronavirus-genome/319

Another update: There is a new preprint in Researchgate

https://www.researchgate.net/publication/338957445_Uncanny_similarity_of_unique_inserts_in_the_2019-nCoV_spike_protein_to_HIV-1_gp120_and_Gag

It also has the complete paper. The claim is that the virus has parts that do not appear in other coronaviruses but are identical or similar to those in the HIV-1 virus. If so, this virus is created in a laboratory.

7 Comments

Beefcake the Mighty March 22, 2020 Reply

Hi j2, can you recommend a good tutorial for getting up to speed on these kinds of (genomic) calculations (I have a background in mathematics, FWIW)? Thanks.

jorma March 22, 2020 Reply

Hi Mighty,
I am not from that field and cannot recommend any tutorial, but I can explain how I would do it. From the IBM preprint read the following text.
“In this study, we have collected 48 publicly available genomes from 2019-vnCoV infected patients. 80 distinct variants were identified with 43 missense, 21 synonymous, 3 deletion, 11 non-coding and 2 non-coding deletion types. Most common variants were 28144T>C and synonymous 8782C>T in 13 samples which occur for the same samples mostly collected outside of Wuhan.”
Look at Wikipedia what is missense and the other mutations mentioned. Basically what happens with the RNA code is that some symbol is changed to another, like C to T. The place they mention here is place 8782, C to T. You do not need to care of the names of these places. Simply take some (or even all 48) full RNA genomes from Wikipedia. Take this German Wikipedia site https://de.wikipedia.org/wiki/SARS-CoV-2 (move to the German site if you get English), you see in the right side links like MN985325. They are the variants. Click those that you want to investigate. You find the full RNA code from the link. Copy the full code. Then compare the files of two variants (in Unix there is diff, I remember). You have to look what the differences are, but mostly they are single aminoacid changes. You store the number of changes for pairwise differences of variants and try to put them into order. As the IBM preprint has the circle diagram where they did just that, you can check if they did correctly. Then, the only contribution I can make is that if you now have some theory how these variants developed from each other (I think you are interested on 5-10 variants if you want to know where this virus originated, so it is not too much work), then look at each case where one variant is supposed to be derived from the other variant and see if there are too many back mutations or something else improbable. A back mutation is that first something mutated like T to C, and then C mutated back to T. It is unlikely as C could have mutated to three possible bases, not necessarily to T. You get a probability: to square the probability of a mutation in one aminoacid times 1/3. (That is, a back mutation is much more improbable than a single mutation). But it is quite possible that the virus did originate in the USA even if the phylogenetic tree in the preprint is not quite OK. The fact that many early variants of the virus did come to the USA is not quite what one would expect.

I would look at the last paper and the picture there and select only those variants that are at most three steps from the Wuhan which is the first in the picture. Only one of those can be close to the real root. That does not leave many. Then take full genomes of all of them but compare some part initially, maybe twice as long part as was compared in the last paper. Mutations are not difficult, either they change one aminoacid, or they change, skip, duplicate, a whole part. By looking at the changes with diff, you should be able to see if any of these can have mutated from another. Preferably, no back mutations in your tree.

Sorry that I cannot help more, but this one is not my field.

Beefcake the Mighty March 23, 2020 Reply

Thanks j2.

Beefcake the Mighty April 9, 2020 Reply

BTW, turns out you were 100% right about Ron Unz, check out his latest comment here:

https://www.unz.com/runz/the-government-employee-who-may-have-saved-a-million-american-lives/?showcomments#comment-3825457

“Fraud” doesn’t even remotely describe the guy. Fool me once…

jorma April 9, 2020 Reply

I hope I understand what you mean. You must be referring to this sentence:
“Imagine what the numbers would have been like without the lockdowns that these individuals are ferociously denouncing.”
It is a typical problem with (ex-)physicists. They try to imagine things and use intuition without doing math, at least correctly. As from 15 Mar to 22 Mar the number of infected in the NY grew to 19 times (the R0 value of a virus depends on the behavior of the population, obviously NY people meet each other very often, this is very fast growth), we can estimate that in 4 weeks the original number in 14 March of some 250 would have grown to 32 millions, but as the population of the NY is 8.6 million, the number of infected would have been about 90% of the population, 7.7. million. No need to imagine anything.

Anyway, there were too many trolls on Unz. I saw that you and Iris still comment there, but I got tired with people like Wally and Carolyn Yeager, so I decided to quit following that site.

Another thing, I saw a paper that tells of (I think freely available) software tools for studying coronavirus genome
https://www.researchgate.net/publication/338989161_Genome_Detective_Coronavirus_Typing_Tool_for_rapid_identification_and_characterization_of_novel_coronavirus_genomes
But I personally would just wait until other researchers do this as papers come very often. The question now is whether the Wuhan bat virus, which has 3 insertions of the 4 insertions in Covid-19, is man-made, that is, a bioweapon passed through a bat in order to stabilize it. This could be estimated if the age of the bat virus is after SARS. I have not yet looked at this as now I am writing a paper on the problem with the path integral.

Greetings to you and Iris.

Beefcake the Mighty April 9, 2020 Reply

No, I’m referring to this statement:

“ Our website tends to practice very light moderation, which presumably draws in those lunatics who have been banned most other places. But their endless stream of crazy comments is cluttering things up, so I think I may just have to go ahead and sharply restrict the quantity of their ranting.”

In other words, he’s going to start deleting the posts of people who disagree with his alarmist viewpoint. Now, that’s his right of course, but the simple fact that he doesn’t understand statistical analysis (see his response to me on that thread) while dismissing skeptics as lunatics suggests to me that’s he’s just another limited hangout (as other alt-media sites have been exposed as), and that his claims to providing a forum for open debate on controversial topics is pure rubbish. The fact that the known idiot utu is cheering him on here is a pretty good indication of what’s going on.

jorma April 9, 2020 Reply

“The fact that the known idiot utu is cheering him on here is a pretty good indication of what’s going on.”

I kind of thought you had something else in the mind than that Ron cannot estimate how many would be infected, and I did think there were sockpuppets in that site. Anyway, you and Iris were great. BR. j2

More of this coronavirus genome studying tool (but have no time to try it now at least):
https://www.researchgate.net/publication/327140426_Genome_Detective_An_Automated_System_for_Virus_Identification_from_High-throughput_sequencing_data

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.