Newswise – DALLAS – February 8, 2021 – World health experts have long suspected that the incidence of COVID-19 was higher than reported. Now, a machine learning algorithm developed at UT Southwestern estimates that the number of COVID-19 cases in the US since the pandemic began is almost three times higher than that of confirmed cases.
The algorithm, described in a study published today in PLUS ONE, provides daily updated estimates of total infections to date, as well as how many people are currently infected in the US and in the 50 countries most affected by the pandemic.
As of February 4, according to the model’s calculations, more than 71 million people in the United States – 21.5% of Americans – had contracted COVID-19. This compares to the substantial number of less than 26.7 million publicly reported confirmed cases, says Jungsik Noh, Ph.D., assistant professor at UT Southwestern in the Lyda Hill Department of Bioinformatics and the first author of the study.
Of the estimated 71 million Americans who had COVID-19, 7 million (2.1% of the US population) had current infections and were potentially contagious on February 4, according to the algorithm.
Noh’s written study is based on calculations completed in September. At the time, he reported, the number of actual cumulative cases in 25 of the 50 most affected countries was five to 20 times higher than the number of confirmed cases suggested at the time.
Looking at the current information available on the online algorithm, the estimates are now closer to the reported figures – but still much higher. On February 4, Brazil recorded more than 36 million cumulative cases, according to the algorithm’s estimates, almost four times more than the 9.4 million confirmed cases reported. France had 14 million compared to the 3.2 million reported. And the United Kingdom had almost 25 million instead of about 4 million – more than six times as much. Mexico, an outlier, recorded nearly 15 times the reported number of cases – 27.6 million, rather than 1.9 million confirmed cases.
“Estimates of actual infections reveal for the first time the true severity of COVID-19 in the United States and in countries around the world,” says Noh.
The algorithm uses the number of reported deaths – considered to be more accurate and complete than the number of cases confirmed by the laboratory – as a basis for its calculations. It then assumes an infection mortality rate of 0.66%, based on a previous study of the Chinese pandemic, and takes into account other factors, such as the average number of days from the onset of symptoms to death or recovery. It also compares its estimate with the number of confirmed cases to calculate a ratio between confirmed and estimated infections.
There are still many uncertainties about COVID-19 – especially the mortality rate – and therefore the estimates are harsh, says Noh. But he believes the model estimates are more accurate and set aside fewer cases than confirmed ones currently used as guidance for public health policies. It is important to have a more comprehensive estimate of the prevalence of the disease, adds Noh.
These are critical statistics on the severity of COVID-19 in each region. Knowing the true severity in different regions will help us fight effectively against the spread of the virus “, he explains. “The currently infected population is the cause of future infections and deaths. Its actual size in a region is a crucial variable needed when determining the severity of COVID-19 and building strategies against regional outbreaks. ”
In the US, infection rates vary widely by state. California has had nearly 7 million infections since the beginning of the pandemic, compared to 5.7 million in New York, according to the February 4 algorithm projections. The model also estimates that California had 1.3 million active cases at the time, affecting 3.4% of the state’s population. .
Other estimates for the February 4 model: in Pennsylvania, 11.2 percent of the population had current infections – the highest rate in any state, compared to a low level of 0.15 percent of those living in Minnesota; In New York, an early hot spot, 528,000 people had active infections, or about 2.7% of its population. Meanwhile, in Texas, 2.3% had current infections.
Noh says he developed the algorithm last summer while trying to decide whether to send his sixth-grade daughter back to school in person. He was nowhere to find the data he needed to measure safety, he says.
Once he built the car’s algorithm, he discovered that the area in which he lived had a current infection rate of about 1%. So his daughter went to school.
Noh verified his findings by comparing his results with existing prevalence rates found in several studies that used blood tests to check for antibodies to the SARS-CoV-2 virus, which causes COVID-19. For most areas tested, estimates of its infection algorithm corresponded closely to the percentage of people who tested positive for antibodies, according to PLUS ONE study.
The online model uses COVID-19 death data from Johns Hopkins University and The COVID Tracking Project, a volunteer organization founded to help track COVID-19 to run its daily updates. However, estimates published in PLUS ONE the study dates from September 3. At that time, about 10% of the US population had been infected with COVID-19, based on Noh’s algorithm.
Gaudenz Danuser, Ph.D., president of the Lyda Hill Department of Bioinformatics and professor of cell biology, was the lead author of the study. He also holds the distinguished Patrick E. Haggerty Chair in Basic Biomedical Sciences.
Funding came from Lyda Hill Philanthropies.