Simple recipes for fixing Piffer’s problem

If you break something into pieces it would be nice if you also are able to put it back together, preferably after fixing the problem. Nowadays you cannot fix much anything technical. If it breaks, it breaks, and if you open some device, some plastic piece probably breaks when you try to close it, but Piffer’s results can be fixed.

Davide Piffer has in many papers shown a very good correlation plot, in fact, a straight line, between what looks like a measure of genotype (PGS) and a measure of phenotype (IQ). HBD people have been impressed by this and consider it as a strong argument that IQ is determined by genes, at least the average IQ of a country is determined by average Polygenetic Score (PGS) of IQ related SNPs. The fact is that the line does not show anything like this. The PGS is not a measure of genotype. Though it is expressed with SNPs, the numeric value of PGS is derived from educational achievement scores, not from the effect of those SNPs. The SNPs only identify a population. Piffer’s line is what you get when you plot one good predictor of educational achievement (PGS created for predicting educational achievement) against another good predictor of educational achievement (IQ was developed for predicting educational achievement). The result is a straight line. This is worse than a typical case where there is correlation and we may wonder if correlation here implies causation. In Piffer’s curve the correlation itself is a direct result of a mathematical relation. There is no need to explain it.

            The problem in his results was that he summed a GWAS predictor of educational achievement over each subpopulation and by doing so he got a predictor that gives exactly the same result was what he would get if each individual in each subpopulation had exactly the subpopulation average score in educational achievement. So, he could replace his test population with this kind of population. In this new population the predictor cannot do anything else than to identify the subpopulation to which a person belongs by using the SNPs. The educational achievement score comes directly from the subpopulation average scores and they have no direct connection with the SNPs used.

            Indeed, a fairly good Piffer correlation can be made by using SNPs that determine skin color, and skin color genes have no effect on the IQ and the educational achievement score is supposed to reflect.

            The case that some SNPs may not be related to the property that is measured is a known problem with GWAS and researchers usually manage to remove this problem. For instance, in a GWAS of lung cancer, some SNPs may reflect a smoking habit and not cause lung cancer, while others are linked to lung cancer. The usual fix is to remove the former ones. So, this is known. It is also known that if the predictor sample includes genetically different populations, then the predictor cannot be used for comparing others samples populations between each others, an error done in Dunkel et al (2019). Neither of these two known problems is the fatal problem in Piffer.

            Piffer’s problem is related to these two, but it is different. All SNPs in the PGS used by Piffer are linked to educational achievement, so they cannot be removed from the PGS, unlike skin color genes, which could be removed. The problem is that their combined effect is rather small and they do not explain educational achievement scores. The scores the PGS gives are not derived from the real (small) effect of these SNPs. The scores reflect the measured values of educational achievement. Consider an example. If education is changed in a country, educational achievement scores are often changed. The population is not changed, so genes are not changed. We would expect that the PGS score for this country reflects the genes and is not changed. Thus, the data point of this country moves away from Piffer’s straight line. This is not what happens. The PGS score for this country changes, though the genes do not change. It is easy to see that if this country is in the prediction sample, the PGS score mimics the educational achievement score because that is where from it is calculated: it predicts the score. It is not a true value derived from genes. But this can happen even if the country is not in the prediction sample: if a genetically similar country with a similar education change, is in the prediction sample, this country, being genetically similar to the one used to create the predictor, moves the same way. You see the problem here. The data points stay quite well on a straight line, but they can move along the line because of environmental reasons. 

            How to fix this?

            I suggest that Piffer explains that the country average PGS is only a predictor, not an explanation. It gives correctly the observed trend in educational achievements and as such is a good starting point, but the new contribution is an improvement of this predictor. This part is missing from the paper, but adding it makes a nice result. Piffer selects a set of environmental factors that are known to influence educational achievement, makes an additive correction term to the PGS predictor as a sum of environmental factors, and matches it to a large set of populations. Then he can show that his improved predictor is better than the original PGS. He can estimate how much the educational achievement score can be affected by those environmental factors, and he can even give recommendations. As a conclusion is notices that this amount of educational achievement can be influenced by fairly simple environmental factors, but it is not the total effect of environment, since the trend itself has environmental causes, in addition to genetic ones.

            The second result can be investigating the trend: Piffer makes a cultural model where advanced culture spreads from some centers, like Europe/USA and China/Japan, and also from India, but there the cast system limits the effect to upper castes. The further a population is genetically from these centers is likely to correlate with educational achievements. This gives a predictor that only uses environmental factors. Piffer compares a trend of this predictor with the trend of the PGS predictor and concludes some carefully formulated statements of the role of genes.

            A third result can be obtained by looking stratification within a country. A prime example is India with its case system, but the class societies of England and France will do, as will the USA with its WASPs and Jews. Piffer looks at the problem that SNPs basically only identify a subpopulation and tries to find cases when this identification of a subpopulation shows differences in educational achievement that are clearly not a result of genes.  

            That’s three possible result suggestions for Piffer. You probably can easily think of some more.

2 Comments

Technobari.com October 19, 2019 Reply

Follow the recipe for Simple Scones, adding a generous teaspoon of finely grated lemon rind (zest) to the dry ingredients and substituting dried blueberries for the raisins.

jorma October 20, 2019 Reply

Great help, thanks, but I think Davide Piffer has a methodological error and it should be fixed.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.