A friend sent me a blog post about a retracted Johns Hopkins study that contradicts what I have heard about COVID deaths. Have you heard anything about this? What is your take? - R from Mississippi
A friend sent me a blog post is an ominous way to start a message (And this is coming from a person who writes blog posts), but I had not heard about the study and was curious. The blog post (which I will not link to) provided a couple of plots and a quick summary. The basic thesis was that the increased number of deaths linked to COVID was almost perfectly offset by drops in other causes of death meaning the COVID deaths might actually be mischaracterized. On the one hand, this is not surprising to hear. These sorts of dubious claims have been around for months and have been debunked many times including here (https://markpanaggio.wixsite.com/home/post/exploring-excess-deaths). I spent quite a bit of time looking at the death counts recorded by the CDC and it was abundantly clear that there had been many more deaths this year than normal. On the other hand, the blog post the reader sent suggested (a) that the author was from Johns Hopkins (which I consider to be a respectable institution...although I might be a bit biased on that front) and (b) that the author used the same CDC data that I used, so I decided to take a closer look.
Disclaimer: I am not speaking as a representative of JHU. This blog reflects my personal views which may not reflect those of my employer.
For those of you who want to skip to the punch line (or as the kids say TLDR): After carefully reviewing the "study" in question, it is apparent that it has serious methodological problems, it completely ignores trends in the data that contradict its central thesis, the "evidence" provided in the study does not even support the claims of the speaker and subsequent summaries. In other words, this study is just plain wrong and that is why it only received attention when people within the blogosphere and on social media latched onto the flawed conclusions without carefully examining the details.
The blog linked to a JHU website that had shared the initial study. I was expecting some sort of press release, but instead I found the JHUNewsletter which indeed included a retraction letter of a previous article citing the study in question. However, I was surprised to learn that (a) the JHU Newsletter is a student run periodical for students who are “interested in journalism” since JHU “does not a have a formal journalism program” and (b) the study was actually a webinar by an adjunct professor rather than a peer-reviewed research article. This means that the article was published and then retracted not by the university itself but by a student organization and the contents of the study were not peer reviewed or vetted in any way. Unfortunately, the article was sparse on details and did not provide a link to the source material. Fortunately, this dead end was only temporary as a web search revealed that the American Institute for Economic Research (a libertarian think tank) had written about it and shared the webinar in its entirety (it is also available on youtube). So, I decided to bite the bullet and watch the 67 minute webinar to see what it was all about.
Why? Because I am a glutton for punishment. But also because I think its important to be willing to have your beliefs and assumptions challenged and I want to be open minded. About 40 minutes later (I like to watch on a higher speed), I finished the video and was no more enlightened than I had been before. The talk focused on three main plots (which you can find here along with the video if you are curious and want to compare to my descriptions: https://www.aier.org/article/new-study-highlights-serious-accounting-error-regarding-covid-deaths/). Let's talk about them one by one.
The first plot showed the breakdown of deaths by age group in each week. The speaker pointed out that the age distribution of deaths was relatively consistent throughout the year and that the distribution did not change post COVID. This surprised the speaker because COVID is especially dangerous to the elderly and yet their share of deaths did not change or at least not in a way that stood out in the plot. There are a couple of problems with this:
1. Eyeballing a plot is just not a good way to test hypotheses. Plots can be misleading (both intentionally and unintentionally) and one way of visualizing the data can reveal a trend that the another conceals. That is why we have a discipline called statistics that provides a suite of tools for testing hypothesis. Unfortunately, the speaker did not actually use anything beyond a bar chart from Microsoft Excel.
2. The hypothesis that COVID should produce a noticeable change in the distribution of deaths across age groups only makes sense if deaths due to COVID have a markedly different age distribution than deaths in general. However, as I pointed out in an earlier post (https://markpanaggio.wixsite.com/home/post/this-one-chart-showing-mortality-risk-will-blow-your-mind), death itself primarily affects the elderly and the differences between the age distributions of COVID and other causes are subtle at best. So, there is no reason to expect a dramatic change in that distribution post COVID, especially not the type of change that could be detected by eyeballing a plot. This is yet another reason why you should use statistics and not a cursory examination of a graph before drawing major (and contrarian) conclusions.
The second plot showed the deaths by week caused by different causes. The author zoomed in on one small portion of the plot (early April) and then compared it to another year (2018) and pointed out that the deaths caused by things like heart disease and cancer looked a bit lower in 2020 than in 2018. This was an interesting idea, but the discrepancies she was claiming were far from obvious from the plot she showed (she admitted as much) and again there was no statistical analysis provided at all. The remarkable thing was that while the speaker claimed to observe a subtle trend in the case counts, she completely ignored the obvious one that was staring her in the face: the dramatically higher total death count in 2020 as compared to previous years including 2018 (which had a particularly severe flu season). Don't believe me? See for yourself. Here is a screenshot from her presentation showing the total number of deaths (blue) by week over the last few years as well as the total number of deaths due to other causes (various colors at the bottom). It is painfully obvious that 2020 has higher totals than in previous years and that occurred primarily during two spikes that coincide perfectly with the peaks of COVID deaths.
Before we dismiss her claims entirely though, I should note that she did provide a table with some additional details. Unfortunately, this table (the third of the aforementioned plots) made even less sense than the other plots. It showed the change in deaths from one week to the next due to a variety of causes. She then claimed that the increase in deaths due to COVID were almost perfectly offset by a decrease in deaths due to heart disease. Again there were a host of problems with this.
1. She claimed to have data from February to September but only showed results for three weeks in April. Three datapoints is simply not enough to detect a pattern! Choosing to focus on three weeks is an odd choice that suggests that either the "pattern" she was seeing didn't hold up or she didn't bother to look. Either is a concern.
2. Looking at the change from the previous week does not make sense. The total deaths in a given week vary for all kinds of reasons that have nothing to do with COVID or with the larger trends (the season, whether a holiday or large event took place during the week, reporting lags, etc.). So the change from week to week is largely noise. What you want to know is whether the death totals were different this year as compared to the same week in previous years. As the plot above shows, the death totals this year are actually higher in most weeks.
3. The values in her table literally do not add up. She claims that the rise in covid deaths was offset by drops in deaths due to heart disease, however in her table the deaths due to heart disease actually increased during one of the weeks. The only way that could be offset by the change in COVID deaths is if there was a negative number of COVID deaths!
So, the best case scenario is that the table was mislabeled and confusing and I just misinterpreted it (you would hope that a good study would be comprehensible to a data scientist with a PhD in applied math). However, I suspect that the table simply doesn’t make sense and does not show the trends she seems to think it shows. Either way, it does not provide support for her claims.
Now a normal (sane?) person would give up at this point and just let it go, but I decided to keep going. So I went to the CDC website and downloaded the data on the number of deaths by cause to see for myself if there was a decrease in other causes that offset the deaths due to COVID. You can see a couple of representative examples below (I will spare you all 13, but they tell similar story).
It looks like most causes of death saw no change or even increases relative to previous years during the pandemic except for the last couple of weeks (where the death counts are incomplete). Heart disease in particular was higher than 2015-2017 and about the same as 2018 and 2019.
What about that narrow window in April that the speaker referenced? Well, I totaled up the number of deaths in that window and compared with the previous 5 years to see if the deaths due to other causes dropped as the speaker claimed. This is displayed in the table below (which shows deaths between April 6 and April 26).
I have colored the death totals in 2020 red if there were more deaths than any of the previous 5 years, yellow if there were roughly the same number of deaths (between the max and min) and green if the number of deaths decreased. You will notice that THERE IS NO GREEN! Every cause of death saw an increase or no change relative to previous years. I could run a hypothesis test to see if the difference is statistically significant, but there is no need because the data suggests the opposite of the hypothesis, so we can safely conclude there is no evidence for the hypothesis that COVID deaths were offset by decreases in other deaths. This outcome is precisely what you would expect to see if we were undercounting COVID deaths and if hospitals were unable to treat other conditions as effectively due to an unusual influx of patients.
So, I have no idea how the speaker arrived at her conclusions. I hope it was just an honest mistake, but I can find no evidence for her central claims in the CDC data. In fact, they tell precisely the story they told back in September when I first looked at the data (before her “study”). Frankly, I am shocked that anyone could watch her presentation without seeing her plot of total deaths in 2020 (my version of the same plot is below) which was dramatically higher than previous years and not be confused as to how the total deaths went up despite her claims that COVID deaths were just mislabeled deaths due to other causes.
There are a few things we should take away from this discussion:
1. If you stumble across a counter-intuitive and/or contrarian finding, its worth taking a closer look before you believe it and ESPECIALLY before you spread it across social media. In this case, the fact that the findings contradict the CDC’s own conclusions should have been a red flag. The CDC has added some nice visualizations of the excess death data to their website that clearly contradicts the findings of this study. https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm?fbclid=IwAR2AUfuYSIMeQxdXuTv7eRHljEJuOsdjq2WOWO2hpQWHKyHBlFheKTm4zfw
I recommend looking at the deaths by age group and deaths by cause group in particular to see why the claims from this study just don’t add up.
2. It is worth considering the source of an article. A blog post citing a student newspaper citing an unpublished webinar is simply not a trustworthy source. Even a cursory search would have revealed that information, but I suspect that few read beyond “JHU study”. Curiously (or perhaps not so curiously) this sort of information was absent from the discussions of the study that I found online including sources like the think tank that should have known better.
3. There is a reason that scientific studies go through a peer review process. At the very least that process results in refining a paper so that its findings can be presented more clearly so that other researchers can understand them. This work was very confusing to me (and I listen to technical talks and read technical papers on statistical topics on a daily basis) and would certainly have been clearer had it been reviewed by someone else. But, more importantly the peer review process should catch obvious flaws and prevent studies with questionable methodology and unsupported conclusions from making it to print. This process is not perfect, but it is important. This so called “study” did not go through that vetting process (and almost certainly would not have made it through in this form). Unfortunately, that did not prevent it from being passed around the internet as if it was authoritative.
So were the deaths due to COVID misclassified deaths by other causes? Of course not. There have been around 350000 more deaths in 2020 than in a typical year. But motivated reasoning is a powerful thing and the proliferation of this study seems to be yet another example of people believing what they want to believe and then finding “evidence” to support it.
Next up: lockdowns. As always, feel free to reach out if you have a question you would like me to explore.
PS. Some might point to the retraction of the paper as evidence of a cover up. That is certainly one explanation. Another is that the study was of low quality and should never have been published in the first place. I don't have a window into the minds of the editors of the student newspaper, but I suspect that once the article went viral and once they realized how it was being used, they found themselves in an impossible position.
PPS. I know some will respond to this evidence of excess deaths and say that the CDC data is fabricated and you can't trust it anyway. The most amusing part about this counterargument is that the same people were happy to trust the CDC data when they thought it confirmed their prior believe that the pandemic was being overblown. You can't have it both ways. Either the data is trustworthy or isn't. You can't use data to support your beliefs and then dismiss that same data when it contradicts them.
Comments