How do you know that something you are looking for is not there? Looking for a needle in a haystack is fundamentally easy – however laborious and tedious – if you know it’s definitely there. Looking for something, not finding it, and therefore concluding it does not exist is a different problem.
So what does the lack of new cases tell us about the true frequency of infections in the Victorian population? Or, to put it another way, what is the maximum number of infections that could still lurk out there undetected?
These are what statistician call sampling problems. We do not test everyone, but instead rely on people with symptoms to come forward for testing. If everyone with symptoms gets themselves tested, this should give us a good idea of how many cases there are.
There are caveats: some people do not come forward for testing while others get tested several times; cases tend to cluster in families. But we can account for such uncertainties in the analysis framework that we use below.
Plenty of people are still getting tested. People check the Department of Health and Human Services’ social media feeds to see the daily “0” (the celebrated “doughnut”); some are concerned about the number of tests performed each day; and many people seriously worry about the chance of a return of the virus.
Working out the probabilities
However, we can estimate the probability the virus is still out there in Victoria. There are different ways to do it, but ultimately they all give very similar results.
One good way is to adopt a “Bayesian” approach, which also lets us work out how accurate the estimate is likely to be, given the uncertainties in our assumptions and inputs. We could do the calculations exactly (using a paper and pencil, or computer algebra software), but for making predictions we usually use simulations.
For our estimate we need to know a few numbers:
- N: the total number of people in Victoria (about 6.5 million)
- n: the number of tests carried out
- p₀: what we think (or fear) the frequency of infected people in the Victorian population is, before we look at the testing data.
With this we can estimate p, the frequency of cases, after taking into account that we found 0 positives among n tests. A p value of 1 would mean everybody in Victoria has COVID, and 0 would mean nobody does.
Running the numbers
In the Bayesian framework we calculate p as a compromise between our prior knowledge (or beliefs) and the new information gleaned from the data.
The prior forces us to state explicitly what we expect or believe reality to look like. And because it is a probability it also accounts for our level of certainty or ignorance. When possible we can, for example, use information from previous studies to generate the prior.
To be cautious, we will start with the very pessimistic assumption that an average of 1% of people in Victoria are actually infected. (We can be confident the real number is much smaller, but we are interested in a worst-case scenario.)
We put this 1% figure into our model as a probability distribution (called a “beta distribution”) that produces variable results with an average of 0.01 (which is another way of writing 1%).
If there are 0 positive tests among n tests then this will happen with probability (1 – p)n. The bigger p is, the more people have the virus, and the smaller the chances we would see 0 positive results.
With these two ingredients, the prior knowledge and the information from the data, we can now estimate the true frequency of infection in the Victorian population.
On the first day of the ongoing sequence of zero cases, October 31, 2020, there were 19,850 tests performed (thus n=19,850). The expected value for the true positive rate in Victoria on that day was therefore a tiny 0.0000000041 (4.1 × 10–9). We ran a million simulations of this scenario, and only in 260 instances were there any cases at all left in the population, with a maximum of 986 possible hidden cases.
Now after over a month of zero cases, and a total number of 438,950 tests between October 31 and December 2, the estimated probability has gone down even further to 0.00000000011 (1.1 × 10–10). The highest number of lurking infections in one million simulations is now 39 cases (and only 132 of our million simulations contained any cases at all).
What we can learn from this
Three points are worth considering, especially when applying this approach in the context of other states and territories, or Australia as a whole.
- These estimates are based on assumptions, but we can test how changes (or errors) in our assumptions affect the analysis. In this case relatively little: it is extremely unlikely there is even a single COVID case left in the Victorian community.
- We can also ask when we would be likely to detect cases of COVID-19 if it re-enters the community. The current testing regime turns out to be remarkably sensitive. Even with only 5,000 randomly(!) administered tests we would have a better than 50-50 chance of detecting a case if only 0.0014% of Victorians – or about 91 people – were (asymptomatically) infected. If people with symptoms continue to get tested even single cases will be detected and that is what we want.
- Testing is therefore important and the key to prolonged suppression. The simplistic statement that you get more cases if you do more testing fails to take into account just how important testing is to control the disease, especially in the early and the final suppression stages. For as long as testing is easily accessible throughout the state and used by (a large fraction of) people exhibiting COVID-like symptoms we should be able to detect and quell any resurgence, even before a vaccine becomes available.
We were arguably lucky to get to zero cases, but we can be very confident that we have now eliminated COVID-19 in the community. The absence of evidence for coronavirus infections has slowly become evidence for the absence of the virus from Victoria.
Images used courtesy of Pexels/Chokniti Khongchum