When you make a mistake, you admit it, learn from it, and move forward. My previous COVID-19 projections used Colorado COVID case data incorrectly, and the correction I made yesterday shows a rosier picture.
I have some wonderful news to report regarding the math model and methodology of COVID-19 in Colorado that I’ve been using for the last month and had revamped in the last week:
It’s all wrong.
I’ve been using the top-line cumulative numbers from the Colorado Department of Public Health and Environment (CDPHE) reported every day at about 4 PM. However, I’ve also been assuming that those numbers were largely correct, indicating that the changes reported were largely from that day, rather than corrections to previous days’ data. And that assumption is where I went wrong.
It turns out that the CDPHE is getting major data updates from around the state going back as far as February. When the CDPHE gets those updates, they backdate them to when they happened to get a more accurate picture of the pandemic in Colorado. What that means, though, is that there can be significant changes in the data that can change our understanding of it for weeks after the day passes.
I’ll study the ramifications of this in coming days, but let’s start by looking at what my error means for Colorado’s COVID-19 cases.
My original methodology (with a new model revamp given a month worth of data since my April 12 COVID update) showed that Colorado would likely hit a peak number of COVID-19 cases of about 85 thousand on around May 25. I originally projected that we’d reach peak hospitalizations sometime in June at more than 40 thousand cases, but I hadn’t actually carried my detailed projections that far into the future yet. And I originally projected the number of deaths to peak around May 8th at between 2000 and 5000.
The fact that the peak in deaths was preceding both the peak in cases and peak in hospitalizations meant that the model still wasn’t quite right, but it was a whole lot closer than my original simple doubling model I developed early and used until April 12.
So, not great, but nowhere near as bad as even my daily updates to my simple doubling growth model showed.
When I use my new model but with the corrected, backdated data from the CDPHE as of April 22, it shows a much rosier picture. Now total cases look to be about 47000 on around May 13th, half the total from my original methodology. Total hospitalizations look like we peaked yesterday at about 2100, and total deaths look like we’ll peak in about a week at about 1000.
Now, if you recall, I said that the weird phasing of my cases, hospitalizations, and deaths indicated a problem with the model, and this model shows the same weird phasing, so we know that there’s still an underlying problem that needs to be figured out and fixed. But that’s for another day and another update. At the moment I’ll take the good news that my methodology had a major flaw and that the number of cases and deaths are likely to be much lower than I originally said.
But I have to add a note of caution. If you graph the curves for hospitalizations and it’ll look like hospitalizations peaked in late March and deaths peaked about 10 days ago, but I’m skeptical about the data. Partly that’s because I just got bit by misinterpreting data, but it’s more because we know that the data is being updated and we know that there are a lot of asymptomatic carriers of COVID-19.
The time phasing problem could also mean that there are a lot of COVID-19 hospitalizations and deaths that haven’t been reported yet and that will shift the peak around. Given my quick review of how much the data has changed over just the last three days of backdated updates, this is certainly true. For example, in just the last 3 days, backdated updates increased the number of hospitalizations on March 29 by 20% (from 51 to 61), April 1 by 24% (55 to 68), April 5 by 105% (34 to 48), and so on. As we would expect, the amount of the increases is lower the further back in time we go, but it doesn’t take too many days of updates like this to spread out or move the peak forward by a day or week or more. The deaths per day show similar changes.
We could also have problems with all three datasets, which is what I think is actually true.
What this means is that the nature of using backdated data makes the data for today very, very noisy. It means that we can’t take the backdated data at face value. It means that it will take time to extract the signal from the noise and that we should give ourselves the time we need to do that.
And it certainly means that, until we’ve extracted the signal from the noise, it’s unwise to relax social distancing and stay-at-home policies in the state of Colorado.