May 6 2020
The Math of COVID19, And Factories
Whether we like it or not, the past months have given us a crash course in epidemiology. COVID19 has taken terms like reproduction number, herd immunity, social distancing, or flattening the curve from research literature to daily news and instructions for visitors to California State Parks.
We are in the middle of a pandemic we have partially tamed by putting the economy in a coma. This pandemic has already killed more Americans in two months than the Vietnam war in 20 years and we are facing the unprecedented challenge of restarting factories in this context.
Among the many things to learn in a hurry, are what epidemiologist Adam Kucharski calls the rules of contagion, as they apply to the people who work in a factory and its surrounding community.
Quality control is the closest most of us in Manufacturing ever get to serious statistics/data science. It’s not the same domain as epidemiology, and there is little crossover in tools or methods. This is to share what I have just learned about this topic. I welcome any comment that might correct misconceptions on my part or otherwise enlighten us.
Contents
 Maps
 Contagion over time
 Restarting Factories Safely
 Further Reading
Maps
Statistical epidemiology has a long history, and it started with mapping the geographical locations of cases. It is easier to do now than ever before and still useful in guiding action, whether at the level of a country or a factory.
The Cholera Map of London
In Visual Explanations, Edward Tufte cited the work of John Snow, a doctor who plotted the locations of cholera deaths in London in September 1854 and connected them to the Broad Street water pump. Gary Kapanowski also recommended Steven Johnson’s 2006 book The Ghost Map about the London cholera epidemic.
This was not about models of the dynamics of epidemics but about mapping the geographical locations of infections, visually identifying clusters, and linking them to other features like water pumps.
COVID19 Maps Today
Mapping is easier today and used to support the same kind of detective work John Snow did in London 166 years ago. The following heat map of COVID19 in the US was in the New York Times on 5/4.
If you zoom in on Ohio, you find the highest number of cases per capita in Marion and Pickaway county, on news5cleveland.com. By zooming and investigating further, you find that both counties have large jails where the majority of inmates are infected.
Maps of Fever Anomalies
Mapping is also used, for example, for identifying COVID19 clusters through temperature anomalies detected by connected thermometers. The Kinsa company has sold 2 million connected medical thermometers in the US, and collects 150,000 temperature readings daily. They claim that, after filtering out the noise due to seasonal flu, they can see upcoming COVID19 clusters before the CDC does, on literal heat maps.
Maps of Factory Cases
If, in a factory, you read every employee’s temperature when they arrive and systematically test any employee with COVID19 symptoms, you have a list of confirmed cases within the workforce and you can locate the work location of each on a map of the plant.
If this map shows you that a cluster is growing in the welding shop or in a segment of final assembly, you can zoom in on it, test everyone working in it, disinfect it, reinforce the personal protection equipment in it, and reengineer the line…
Contagion over time
In The Rules of Contagion (2020), Adam Kucharski introduces Ronald Ross as a pioneer in mathematical models of contagion. The charts used for COVID19 in connection with expressions like “flattening the curve” or “herd immunity” are based on refinements of the SIR model developed by Ross disciples Kermack and McKendrick in 1927 in the UK, about the same time Shewhart was developing Control Charts.
For current research on this subject, see, for example, the MIDAS network. MIDAS stands for “Models of Infectious Disease Agent Study” and is a network of academics funded by the US National Institutes of Health.
For charts and data about COVID19, you can go to the Institute for Health Metrics and Evaluation (IHME) at the University of Washington or Our World in Data.
Exponential Growth At The Outset
In the beginning of an epidemic, the number of infected people grows exponentially, and news reports commonly used this term about COVID19 in February and March, 2020. It means that the number of infected people grows at a rate that is proportional to the number already infected.
It’s like instantly compounded interest, as each infected person joins the group that causes new infections. In the charts published about COVID19, the growth has been expressed in terms of the number of days it takes to double, which gives you a straight line in a semilogarithmic plot.
The Katz/SangerKatz Chart
The following chart, by Josh Katz and Margot SangerKatz, was used to show deaths rather than infections in multiple countries, with the straight gray lines corresponding to different doubling intervals.
Even in the short period of time shown here, the actuals plots do not stay straight, meaning that they veer away from exponential growth. It is a model for a trend, treating the number of infected people like a continuous variable when, in fact, it is a random integer.
The Charts From “Our World In Data”
Our World In Data publishes many charts and data sets about COVID19 that are free to download and updated daily. Online, you can select the countries for which you wish to see the plots. They also included the straight lines corresponding to exponential growth at various speeds.
Why plot deaths rather than infections? As COVID19 deaths are better documented than infections, they can be plotted more accurately. The death toll is a lagging indicator of the infection rate, proportional on a given day to the number of infections roughly two weeks earlier.
While not all COVID19 deaths are recorded as such and some deaths from other causes may be misclassified as due to COVID19, deaths are still more accurately recorded, in many different countries.
Many virus carriers don’t know they are infected, don’t experience any symptoms and don’t see a doctor, while they are in fact contaminating others. We don’t know how many there are. See more details below in the discussion of recovery and mortality.
The Exponential Model
In the early stages, between times t and t+dt , the number I\left ( t \right ) of infected people grows by \beta I\left ( t \right )dt and shrinks as infected people recover or die by \gamma I\left ( t \right )dt , so that the net change is dI(t) = \left(\beta  \gamma\right)I(t)dt = \left(R_{0}  1\right)\gamma I(t)dt , so that
I(t) = I(0) exp\left[\left(R_{0}  1\right)\gamma t\right]
with R_{0} = \frac{\beta}{\gamma} , pronounced “are nought” and known as the epidemic’s reproduction number.
Beyond the very early stages, the exponential model breaks down because it takes no account of the decrease of the susceptible population with each infection.
R_{0} : An epidemic’s reproduction number
R_{0} is a key parameter of the disease in a specific human and natural environment. If R_{0} > 1 , the introduction of a small number I(0) of infected people starts an epidemic. On the other hand, if R_{0} < 1 , I(t) quickly peters down to 0 and the disease “vanishes.”
Published Estimates
R_{0} s are estimated and published by the US Centers for Disease Control (CDC) for various epidemics of various diseases. They are a function of both a disease and a population. How a virus to spread and how infected people recover varies with the lifestyle of the population and its living environment.
These differences can be an order of magnitude. For example, US CDC publications rate the MERS coronavirus, a predecessor of COVID19, with R_{0} = 0.45 in Saudi Arabia and R_{0} = 8.1 in Korea. This is 18 times higher. Statements found in the press, like “The flu has an R0 value of about 1.3…” make no sense unless the context is specified.
COVID19’s R_{0} in various parts of the US is not yet known. For the Wuhan epidemic, the Chinese government announced it between 2 and 3, already a wide margin. The CDC, however, now estimates it with a median of 5.7 and a 95\% confidence interval of 3.8 to 8.9 .
In the US, we can expect different R_{0} values for a factory, a state, and the whole country, and ranges of values for each. These wide, nonoverlapping ranges for COVID19 as well as for other diseases suggest that R_{0} is not an easy parameter to estimate.
Undisclosed Estimation Methods
Many publications are coy about the way R_{0} is estimated. Healthline, for example, has an article entitled “How is the R0 of a disease calculated?” that doesn’t tell you. In addition, it mistakenly describes R_{0} as a parameter of the disease only. The School of Public Health at the University of Michigandoes not say much more:
…there’s uncertainty about many of the factors that go into estimating R0, such as in estimating the number of cases, especially early on in an epidemic. Based on these current estimates, projections of the future number of cases of coronavirus are fraught with high levels of uncertainty and will likely be somewhat inaccurate.
The difficulties arise for a number of reasons.
 First, the basic properties of this viral pathogen – like the infectious period – are as yet unknown.
 Second, researchers don’t know how many mild cases or infections that don’t result in symptoms have been missed by surveillance but nevertheless are spreading the disease…
In other words, we don’t have the data needed to estimate R_{0} when an epidemic starts and this information would be the most useful. We can revisit the question after introducing a more elaborate model, that is applicable beyond the initial stage.
Within the workforce of a factory, this information is easier to obtain than in a larger population. We will discuss this below. First, let us introduce the most commonly used model of infectious diseases.
The SIR Model
“SIR” stands for “Susceptible, Infected, and Recovered.” The 1927 SIR model ignores the variations in the overall population N due to births, deaths by causes other than the epidemic, and migrations, and breaks it down into three compartments:
 I infected and contagious individuals
 S susceptible individuals, who can contract the disease through encounters with infected, contagious individuals
 R individuals who have been infected but are no longer contagious as a result of recovery or death and, if alive, are now immune.
At all times, N = I\left( t \right) + S\left( t \right)+ R\left( t \right)
Whether the assumption that recovered individuals are immune holds for COVID19 is currently unknown but it has held for most infectious diseases in the past.
The SIR Model, Explained With High School Math
Like Shewhart with SPC, the epidemiologists of the 1920s had to keep their models simple enough for manual calculations. Today, all sorts of refinements are possible, and the randomness of contagion can be simulated to obtain ranges of outcomes rather than only expected values.
When I << S , we can use the exponential model, as when there are 10 infected individuals in the US population of 330 million. When considering the spread of the disease in a smaller group, like the 3,500 employees of a factory, the dynamics of the three compartments cannot be ignored.
The SIR Model In Equations
Between times t and t + dt , there are I\times S possible encounters between infected and susceptible individuals. A fraction of those possible encounters actually occurs, and a fraction of the actual encounters result in an infection, as shown in the following picture.
With the additional simplifying assumption that all the infected people are equally infectious, the number of susceptibles decreases as follows:
\frac{dS} {dt} = \alpha I S
Near t = 0, S \approx N and, for consistency with the exponential approximation, we choose to use the percapita rate \beta :
\alpha = \frac{\beta}{N}
Meanwhile, a fraction of the infected people recovers or dies, therefore:
\frac{dR} {dt} = \gamma I
This is also a simplifying assumption because if, for example, an infection takes 25 days to run its course, then there will be no recovery at all for the first 25 days after the first infection.
As above the number of infected individuals is increased by the number of new infections and decreased by the number of recoveries. Therefore, the rate of net change is
\frac{dI} {dt} = \frac{\beta}{N} I S \gamma I = \left(R_{0}\frac{S}{N} 1\right)\gamma I
Instead of \gamma, the recovery process is often described more intuitively in terms of the infectious period I_{p} = \frac{1}{\gamma} .
Backoftheenvelope Conclusions About The SIR Model
The High School math above leaves us with a system of three differential equations:
\frac{dS} {dt} = R_{0} \frac{S}{N}\gamma I\frac{dI} {dt} = \left(R_{0}\frac{S}{N} 1\right)\gamma I
\frac{dR} {dt} = \gamma I
with the constraint S(t) + I(t) + R(t) = N for all t .
Even though these equations look simple, the system does not have a solution in the form of a simple formula involving log, exp, polynomials or trig functions. The solutions you see plotted are numerical approximations.
Before giving up on the High School math, however, we can still, on the back of an envelope, figure out the shape of the infection curve and the proportion of the population that is infected by the time the epidemic ends. The following chart shows what we can find this way.
For an epidemic to even start, we must have at least one infected person in the population. Therefore, I\left(0\right)>0 . The population N is large compared to I\left(0\right) , and we can assume that, initially, no one is immune, so that S\left(0\right) \approx N , R\left(0\right) = 0 and
\frac{dI} {dt}\left(0\right)\approx \left(R_{0} 1\right)\gamma I = \left(\beta  \gamma\right) I\left(0\right)
Meaning that, if R_{0} > 1 the rise in the number of infected people is initially exponential at rate \beta  \gamma . Otherwise, the epidemic doesn’t start because the Infected recover too fast.
Beyond t=0, as S\left(t \right) decreases,
\frac{dI} {dt}\left(t\right) = \left(R_{0}\frac{S}{N} 1\right)\gamma I\left(t\right) < \left(\beta  \gamma\right) I\left(t\right)
and the number infected I(t) falls off the exponential growth to reach a peak when S\left(t\right) = \frac{N}{R_{0}} . At that point, there are no longer enough Susceptibles available to sustain growth in the infection.
Beyond the peak, I decreases faster than a negative exponential as S decreases. If S_{1} = S\left(t_{1}\right) for any t_{1} past the peak, then, for t > t_{1} ,
\frac{dI} {dt}\left(t\right) = \left(R_{0}\frac{S\left(t\right)}{N} 1\right)\gamma I\left(t\right) < \left(R_{0}\frac{S_{1}}{N} 1\right)\gamma I\left(t\right) < 0
which means that, beyond t_{1} , the number of Infected drops faster than exp\left[\left(R_{0}\frac{S_{1}}{N} 1\right)\gamma I\left(t\right)\right] .
Susceptibles and Recovered
For Susceptibles and Recovered, the curves start out flat, because
\frac{dS} {dt}\left(0\right) = \frac{dR} {dt}\left(0\right) \approx 0
It’s not exactly 0 because I\left(0\right) > 0 but it is close to it because the number I\left(0\right) of infected people at the outset is small.
They also end up flat because I_{\infty} = \lim_{t\rightarrow \infty}\left [I\left ( t \right )\right ] = 0 . To determine where they end up, we can examine S as a function of R .
\frac{dS} {dR} = \frac{\beta}{\gamma}\times \frac{S}{N} = R_{0}\frac{S}{N}
and therefore
S\left ( R \right ) = N\times exp\left[\frac{R_{0}}{N}\times R\right]
If we take N = S(t) + I(t) + R(t) to the limit when t\rightarrow \infty we get N = S_{\infty} + R_{\infty} and therefore
S_{\infty} = NR_{\infty} = N\times exp\left[\frac{R_{0}}{N}\times R_{\infty}\right]
If r_{\infty} = \frac{R_{\infty}}{N} is the proportion of recovered individuals at the end of the epidemic, it satisfies the equation
1 r_{\infty}  exp\left[R_{0}\times r_{\infty}\right] = 0
We can solve this numerically to obtain s_{\infty} = 1 r_{\infty} as a function of R_{0} and plot it:
Since “Recovered” means “previously infected but no longer contagious,” it includes those who die from the disease and those who suffer permanent ill effects. The proportion s_{\infty} of people in the population who live through the epidemic without every being infected is, therefore, the most interesting, and it drops fast at R_{0} increases.
With R_{0} in the range estimated by the CDC for COVID19 in Wuhan, in the absence of countermeasures, the entire population would have been infected; by the lower estimates of the Chinese government, between 75% and 95% would have been.
We now know a fair amount about the way the epidemic plays out but we still don’t know when the peak happens or how high it is. We’ll need more powerful tools to find out.
Recovery and Mortality
In the SIR model, the Recovered compartment includes those who died from the disease. In everyday speech, death is certainly not a special case of “recovery” and I suspect the term was chosen to fit the mnemonic acronym “S.I.R.”
We can assume that the final number of deaths d_{\infty} of the epidemic proportional r_{\infty} . For COVID19, we don’t have the data needed to evaluate the ratio, for lack of estimates of the number of infected people.
Mortality Data
In March and April, 2020, COVID19 killed more Americans than the Vietnam war in 20 years. We can read in the above chart that reducing R_{0} from 3 to 1.5 would reduce r_{\infty} by 29% and therefore d_{\infty} by the same ratio. In other words, “flattening the curve” doesn’t just smooth the workload of hospitals, it also directly saves tens of thousands of lives just in the US.
Most countries, record times and causes of death. This data, however, is a human judgment call, and there are many reports asserting that underreporting of COVID19 as a cause of death.
Case Fatality Rate Versus Infected Fatality Rate
The most commonly cited metric of mortality from COVID19 is the Case Fatality Rate (CFR), the probability that a person diagnosed with the disease will die of it. What we would really like to know is the Infected Fatality Rate (IFR), the probability that a person infected with it will die of it.
The health care system knows who was diagnosed with the disease and when. Patients who die have been diagnosed before, and we should be able to know when. With this data, for each day in the past, we should know how many people were diagnosed that day, and how many of them died. From these numbers, we should be able to track the CFR over time without bias.
Infected People
Many infected people don’t know it and recover without showing any symptoms. They transmit the virus but leave no trace in the health care system. Testing random samples in the population would provide the data to estimate their number but this has yet to be done.
With the flu also, many catch the disease and selfmedicate without leaving a record, like the Emily character in The Devil Wears Prada:
Based on a 2015 paper, the CDC does not attempt to count the Emilys:
“For the postpandemic seasons, we lacked data on nonhospitalized illnesses and did not estimate this number of influenza illnesses in the population.”
Using the CFRs of two diseases is an applestoapples comparison as long as it is clear that they are not IFRs. As reported in Our World in Data, as estimated by the US CDC, the CFR for seasonal flu is between 0.1\% and 0.2\% ; for COVID19, 5.84\% as of 5/4/2020 per European CDC.
What creates confusion is mixing CFRs and IFRs. On May 2, on CNN, I heard Dr. Sanjay Gupta misspeak when, after discussing the rate of deaths per diagnosed case of COVID19, he said “with the flu, if you are infected, you have 0.1% chance of dying.”
Plotting The S,I, and R Curves
Within the SIR model, to see when the peak occurs and how high it is, you need to solve the system of equations that express the dynamics of the epidemic.
The shinySIR package in R is one of many resources that let you plot epidemic profiles for ranges of R_{0} and I_{p}. In the first example, with a low R_{0}, the infection curve could be confused with a classical bell shape. The second example, with a higher R_{0}, shows that it is in fact both pointy and skewed.
As R_{0} is the ratio of two rates, it is independent of whichever unit of time you use. I_{p} , on the other hand, is a period and therefore expressed in units of time. The xaxis in the following charts just says “Time.” We can assume it to be days.
In the first example, the epidemic dies down before all the Susceptibles are Infected; in the second, the infection reaches everyone.
Bracketing The Parameters
The main limitation of the SIR model is that it treats random numbers of people like continuous variables determined by differential equations. In 1927, Kermack and McKendrick couldn’t go beyond this but we can with an ordinary laptop computer.
S, I and R truly are random integers, and the SIR model gives their expected values. Then:
 \beta \frac{S}{N}dt is the probability of S dropping by 1 and I rising by 1 between t and t+dt and

\gamma dt , likewise, is the probability that I drop by 1 and R rise by 1 .
By dividing the time in sufficiently tiny \Delta ts we can use this to simulate multiple epidemic trajectories and estimate the distributions of S\left(t\right) , I\left(t\right) and R\left(t\right) around the expected values provided by the model.
It is also possible to fit an SIR model to actual data. The rt.live site provides confidence intervals for the reproduction number R_{t} at time t for all 50 states in the US.
If nothing changes from t= 0 onward, then R_{t} = R_{0}\times \frac{S\left(t\right)}{N} . If, on the other hand, public health countermeasures, or the weather, change the dynamics of the epidemic, this shows in the data on new cases, and rt.live refines the estimates by a method published by Bettencourt and Ribeiro in 2008. It provides both a most likely value for of R_{t} and a confidence interval.
If R_{t} > 1 , I is grows; otherwise, it falls. The rt.live snapshot for 4/28/2020 is as follows:
The best value is for Michigan; the worst, for Nebraska. The time series for R_{t} provided for these states are as follows:
The people behind rt.live are Instagram cofounders Kevin Systrom and Mike Krieger, and Gradient founder Thomas Vladeck. Their raw data and Python code are available to download.
Refining The Model
The IT available to us also allows us to remove the simplifying assumptions of the SIR model. When using simulations, we are free to be more realistic:
 We don’t have to assume a constant population N. Mothers can have babies, people can die of other causes, and migrants can join or leave.
 People can be less contagious when asymptomatic, more with mild symptoms, and most with acute symptoms.
 The immunity of recovered patients can have a time limit.
 Transmission may be less likely in hot than in cold weather.
 …
Just because we can, however, doesn’t mean we should. It all depends on what is gained as a result of using a more realistic model, in terms of the decisions it supports.
Flattening The Curve
“Flattening the curve” is the stated goal of the countermeasures taken worldwide, in effect to reduce the frequency of encounters between Infected and Susceptible individuals. The US Centers for Disease Control (CDC) published the following picture to explain the rationale:
The shape of these curves is a clear indication that, after 93 years, the SIR model and its subsequent refinements have not lost their relevance.
The primary motivation given for flattening the curve is to avoid overloading the health care system with a sharp peak in infections. In such a peak, more patients die from being denied care for lack of equipment and health care workers.
There are other reasons to flatten the curve that are not commonly mentioned:
 It reduces the total number of infections and therefore deaths, as discussed above about s_{\infty} and r_{\infty} .
 Those of us who will die from COVID19 would rather do it later than sooner.
 Flattening the curve buys time to find cures, treatments, and vaccines.
SIR Curves And The Spanish Flu of 19181919
On 3/28/2020, the National Geographic published a series of charts on Spanish flu death rate in US cities in 19181919, highlighting, for each city, the period of time during which they applied “social distancing measures,” although I doubt the term was used at the time. These charts are commonly used as evidence of the effectiveness of today’s social distancing against COVID19.
The plots are not of infections but of deaths that occurred in excess of what would have been usual numbers at these times and locations. For comparisons between cities of unequal populations, the quantity plotted is a number of deaths per 100,000 inhabitants. Using the same method, Josh Katz and Margot SangerKatz have plotted the differences between the actual death counts March 15 through May 5, 2020 and the expected values based on prior years’ counts for the same periods in the same locations. They found, for example, that New York City had about 23,000 excess deaths during that period, compared to the 18,000 officially imputed to COVID19.
The visual similarities between the Spanish flu charts and the SIR model suggest that, for all its simplifying assumptions, it does capture the fundamental dynamics of contagion within a population. The CDC today still uses R_{0} as a key characteristic of an epidemic.
The worst and the best cities were Philadelphia and St. Louis, as shown in the following picture:
This article strikes me as valid and informative, with a few caveats:
 The article does not cite the sources of the data and does not explain what specific social distancing measures each city used.
 Since we don’t know the details of what each city did, the difference between the white and brown areas on the chart are between doing nothing and doing something.
 The populations of Philadelphia and StLouis in 1918 were respectively 1.5 million and 687,000 and therefore there is no doubt that the differences in death rates could not have arisen from the same probability of death.
 The brown zone in StLouis did start about 5 weeks after the first death versus 3 weeks for Philadelphia. The brown zone in StLouis started when the death rate was lower and rising slower. In other words, Philadelphia responded to a blaring red alert; StLouis, to a blinking amber light. The outcome shows that it was proper to act at that point.
 The cities that had second bumps obviously did something wrong but the charts don’t contain enough information to figure out what it was.
Restarting Factories Safely
You mitigate an epidemic by reducing R_{0} , which means that you either reduce the infection rate \beta or increase the recovery rate \gamma . For COVID19, we currently have neither a vaccine that would bring \beta to 0 nor cures to boost \gamma .
Until we do, all we can do is reduce the number of encounters between Infected and Susceptible individuals. Testing, contact tracing, isolation, quarantines, social distancing, personal protective equipment, etc., are all countermeasures intended to reduce \beta .
The evidence so far is that they work, albeit at the cost of putting the economy in an induced coma, causing great hardship including famines.
The challenge to manufacturers today is to bring the factories back up without losing the gains so painfully achieved. The workforce of a factory is a population, with its Susceptibles, Infected, and Recovered compartments.
With appropriate monitoring, you can track their evolution over time and use the SIR model or its more sophisticated descendants to anticipate the evolution of the epidemic within this population.
A factory is not, however, an isolated population. Family members can infect employees who then bring the disease to work; employees can catch the virus at work and contaminate their families. If you measure the rate of infection in the community at large, then you can compare it with the rate among employees.
Measuring Infection Rates By Random Sample Tests
We predict election results from samples of 1,000 voters and audiences of TV shows from 5,000 viewers. We should be able to know the COVID19 infection rate of 330 million Americans with a small margin of error from testing a random sample of a few thousand, as recommended by Dartmouth Profs. Daniel Rockmore and Michael Herron. It is difficult to understand why this has not been done yet.
If you are short on test kits, it’s understandable to reserve them for people with symptoms. However, a small number of test kits is all we need to find out what proportion of the population is infected, including contagious people with no symptoms, which is also vital for public health.
This kind of testing will increase the estimates of the number of infected people by an unknown factor, possibly frighteningly large.
In Vò, Italy, they tested the entire population of 3,500, and were able to stop contagion completely. Testing a random sample would not achieve this. All it would do is give us a confidence interval for the rate of infection in the population.
The closest to random sample tests in the US have been two studies conducted respectively out of Stanford University about Santa Clara County and USC about Los Angeles County.
Both studies tested for antibodies, revealing past but not current infections. The Santa Clara County study recruited participants by an ad on Facebook; the Los Angeles study, through a marketing firm. Neither sample was random. Both were selfselected, and the antibody tests had not been vetted by the Food & Drug Administration.
Within a factory, a policy of “cycle testing” of a random sample of employees every day could track the infection level of the entire workforce and trigger alarms, as discussed below.
Cycle Testing
Warehouse managers are familiar with cycle counting as a preferred means of assuring inventory accuracy. If testing for COVID19 infection were as quick, easy, accurate and cheap as taking temperature, then all employees could be tested every day when arriving at work.
The technology for this may eventually exist but, as of now, each test costs about $100, requires specially trained operators, takes up to a week to get the results, and 15% to 30% of the negative results are false.
The processing time for a test sample is on the order of 6 hours, and the rest of the lead time is in transportation and waiting. This says that, for tests to be of any use in a factory, the samples must be processed inhouse, and that the factories must set up their own labs.
Cycle testing is based on cycle counting. You partition the employees into random groups of equal size and test one group every day. For a workforce of 2,000, you might form 20 groups of 100. You would select each group to be a representative random sample. Testing one group every day with sameday results would achieve two purposes:
 Monitoring the level of infection within the workforce, including asymptomatic carriers.
 Testing each employee every 20 working days.
Once the level of infection is low singledigit percentages, you can increase testing productivity without losing accuracy. There is a method first used by the US Army in World War II to test soldiers for syphilis. It involves mixing the samples from a group and testing it. A negative result clears all the soldiers in the group. A positive one causes recalls for individual retests.
Further Reading
Current Publications
 Kucharski, A. (2020) The Rules of Contagion, Profile Books, ISBN: 9781788160193
 BloudoffIndelicato, M.(4/2/2020)This company claims its smart thermometer could help detect coronavirus hot spots faster than the CDC
 Strochlic, N. & Champine, R. D. (3/28/2020) How some cities ‘flattened the curve’ during the 1918 flu pandemic, National Geographic
 Beasley, D.M. (4/22/2020) COVID19 could detonate a ‘hunger pandemic’, Address to the UN Security Council, quoted in the Washington Post
 Levinson, W.A. (4/28/2020) Break the Curve and Keep It Broken, QualityDigest
 Sanche S, Lin YT, Xu C, RomeroSeverson E, Hengartner N, Ke R. (2020) High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis. 2020 Jul.
 Rockmore, D. N. & Herron, M. (4/13/2020) Want to know how many people have the coronavirus? Test randomly, The Conversation
Analytical Methods
 Reed, C. et al. (2015) Estimating Influenza Disease Burden from PopulationBased Surveillance Data in the United States, PLoS One 2015; 10(3): e0118369.
 Bettencourt,L. M. A. & Ribeiro, R.M. (2008) Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases, PLOS ONE

Smolinski, M.S, Hamburg, M.A, and Lederberg, J. (2003) Microbial Threats to Health, National Academies Press (US), ISBN10: 030908864X
Historical Background
 Johnson, S. (2006) The Ghost Map, Riverhead Books
 Tufte, E. (1997) Visual Explanations, Graphics Press, ISBN: 0961392126, pp. 2732
 Meyer, J. (1969) L’enquête de l’Académie de Médecine sur les épidémies 17741794 (Investigation of epidemics by the French Academy of Medicine), Études rurales, n°34, 1969. pp. 769
#covid19, #cycletesting, #epidemiology, #flatteningthecurve, # restartingfactories
Sid Joynson
May 7, 2020 @ 1:08 am
Michel, fascinating. Should help in the Coronawar.
Some advice from previous warwinning experiences, will help fight the next war, and would have made a massive difference to this one.
“It is better to be prepared for war; than to hope the enemy will not come.” I have seen this attributed to Sun Tzu and Von Clausewitz. It doesn’t matter who said it, but it is important that our leaders and the Lean movement understand it.
Process Behavior Charts and Covid19  Donald J. Wheeler  Quality Digest – Michel Baudin's Blog
September 10, 2020 @ 12:01 am
[…] The Math of COVID19, and Factories […]
Evan Graham
February 24, 2021 @ 11:31 am
Hello Mr Baudin, this is yet again another great and indepth article about the ongoing effects that COVID19 has had on the world around us. I was particularly intrigued by the idea of managers adapting to this cycling testing to gauge the possible infection rate of their employees. Dividing employees into groups and randomly testing them could prove costly, but is definitely necessary if the business wishes to move forward.