Monday 15 February 2021

A theory is meant to explain something.

My Heritage has just forwarded me the latest instalment in their fantasy series Theory of Family Relativity™. They boast that they have "found a theory that can explain how one of your DNA Matches is related to you" and inform me (in large bold type) that "R.... R..... is your 4th cousin once removed".

Some epistemologists might be puzzled by the idea that a theory can be "found" rather than being painstakingly constructed. But in this case I am happy to accept that finding a theory (sic) is a convenient shorthand for "cobbling together from a pile of old tat that was lying around the office".

The moment that I clicked the link to View Theory, the merest hint of reality crept into the over-weening confidence of the email message. In type distinctly smaller than the statement of the AssertionTheory of Family Relativity itself, I read "This path is based on one community tree, 3 record collections and 5 MyHeritage family trees, with 20% confidence."

As I scrolled across multiple screens, I realised that this was a Tarzan argument -- leaping from tree to tree with naught but a loincloth to hide one's shame. Each of the eight transitions depends upon the auxiliary hypothesis "so long as these two persona actually represent the same unique individual".

To be fair, My Heritage does not try to hide this essential feature of their case. At each change of source, they have assigned a percentage probability (although they might prefer a more positive term, like confidence or even certainty). Those probabilities range from 92% (that my grandmother in my tree is the same person as someone with identical name and vital dates in another tree) to 20% that R.... R...'s grandfather was reported in the 1891 Census with a different surname, preferred given name and date of birth to those by which he was apparently known to his family.

The complete set of probabilities upon which the chain of the argument depends is {92%, 27%, 27%, 37%, 27%, 100%, 25%, 20%}. I will freely admit that it has been many years since I studied compounding probabilities but I am fairly certain that simply taking the smallest value in the set and assigning that to the complete chain is not how we did it. But that is the only way that I can see by which they could have arrived at the value they quote viz "with 20% confidence".

I did think that a more realistic estimate of P(IF a AND b AND c) might be P(a)*P(b)*P(c); but if the top of the page had read "with 0.0335% confidence" perhaps not everyone would read on.

But set aside the mathematical pedantry, and let's examine the actual genealogical evidence that has been assembled. I am happy to accept that my grandmother has been correctly identified in the second tree, so we shall move on to the next leap - into the 1891 UK Census. I am actually surprised that the quoted proability is as high as 27% given that there were hundreds, perhaps thousands, of young women of Welsh-heritage called Jane Davies. Just why this "theory" should choose one called Sarah Jane as its "match" is somewhat puzzling.

But that point is moot because the probability of Nan who arrived in Queensland aboard the SS Corona in 1884 later being recorded living in England in 1891 is actually nil (or a bit less). Since the smallest probability of any individual step in the argument is now zero, the confidence that one can have in the theory as a whole, even by My Heritage's idiosyncratic algorithm, is ZERO.

The revelation that Tarzan's grip on the vine failed at just the second swing took away all my enthusiam for revealing any subsequent nonsense bundled in the Theory.

I imagine that R... R... has come to the same conclusion as he begins to explore the theory from his end. I do hope that he does not waste too much time because there is nothing for him on this side.

Related Posts Plugin for WordPress, Blogger...