This story is over 5 years old.

The Equations Behind Epidemics

Understanding the SIR model, which describes how an outbreak can catch fire from humble beginnings.
October 5, 2014, 12:30pm

Within epidemiology, the SIR model of is a way of predicting the spread of an infectious disease through a closed population using three broad, easily defined categories of individuals: susceptible (S), infectious (I), recovered (R). These three population segments are related by a series of relatively unmessy differential equations that demonstrate some striking things about how a disease can be spread so quickly from so small of a start.

The SIR model was first published in 1927 by W. O. Kermack and A. G. McKendrick, epidemiologists who were looking to explain the observation that epidemics can get very large with extremely humble beginnings. Epidemiologists of the time were at a loss, lacking a single causal factor sufficient to account for the frequent outbreaks tearing through society on a regular basis. Epidemics often just made no sense.

What the duo found in their equations were two fundamental principles: an epidemic may exhaust itself before the susceptible population reaches zero, and that an initial threshold value exists of susceptible members in a population below which an epidemic will not form. The SIR model was accurate enough that it persists today.

Its description below comes with help from course materials provided by the Mathematical Association of America. The equations are included not to induce pain so much as to illustrate the relative simplicity of the basic relationships.


In the SIR model, people are never added or subtracted from the total population. That is, dying is considered recovery here and no one is ever added to the susceptible group via things like immigration or births. As a member of a population, you either have, had (whether fully recovered, dead, or otherwise no longer infectious), or have not had the disease in question. Immune members can be said to occupy the recovered group.

So, we have three different equations showing the rates of change for each population segment. The rate of change in the susceptible group is modeled as such:

The B coefficient in the above equation corresponds to average number of contacts an average member of population connects with per day. The next piece, the S, is the total number of susceptible members at a given time (as a fraction of the total population) and the last one, I, is what gives us the total number of infected members at a given time (as a bare number, not a fraction).

This tells us the rate of change in susceptible members of a population. If the present fraction of susceptible members is relatively smaller, we can expect a slower rate of change in the susceptible population, as the larger numbers of infected or recovered patients limits the number of people in the population that can become infected (because they can't get sick again and, thus, can't be infectious again).

As the relationship between the susceptible fraction and the infected population changes, the number of daily contacts then takes on less and less weight. At some critical point, however, the rate of change in the susceptible population is basically just the number of infected people times the average number of contacts per day with an ugly negative sign. That's how things get out of control—when there's no buffer of infected or recovered members to absorb some of those daily B contacts.

The "recovered" equation is easier:

The γ here is the fraction of infected members expected to recover per day for a given illness. That recovery fraction just gets multiplied by the fraction of infected members, leaving those that are recovered, e.g. once but no longer sick. Again, this doesn't mean that these members are healthy and walking around, just that they're no longer infectious (and so don't factor into the first equation anymore). What γ actually is depends on the specific disease and its specific mortality rate.

So, when you look at the following graph, remember that r(t) might be 90 percent dead people. Again, these are the stakes in catching an outbreak early.

The whole equation, with all three population segments accounted for, is this, which gives us the rate of change in the fraction of infected individuals.

This one is basically the change in recovery rates subtracted by the change in the susceptible population. Eventually these two terms find some sort of equilibrium, and, as you can see from the graph, the rate of change in the infected population levels off.

The graph above is specifically for the spread of Hong Kong Flu through New York City in the 1960s. In that situation, the number of immune or recovered members of the population was as low as 10 at the start of the outbreak, leaving an enormous susceptible population, and an enormous potential for the disease to spread.

The values for B and γ in this particular situation were selected somewhat arbitrarily: a given individual contacts a new person every other day, giving us a B of just 1/2. Meanwhile, the recovery rate γ assumes a recovery period of three days, leaving a value of 1/3. Bigger values of B (contacts per day) and smaller values of γ (recovery rates) will thus make the spread much worse.

The SIR model hides an important lesson on vaccines. The anti-vax crowd is fond of saying stuff like "my child, my business," but we know that's not true because of models like this. The phenomenon at work is herd immunity.

In our New York model, we had nearly 100 percent percent of the population in the susceptible category, leaving an open playing field for an epidemic to develop and thrive. If Hong Kong Flu had hit the city and there had been some significant population already in the recovered category, the spread of infection would have been dramatically inhibited. Because of how these different populations impact each other, small changes in one population can have huge or seemingly outsized effects on the whole system.

This is reflected in the last equation above, which boils down to subtracting the recovery rate from the susceptible fraction of the population and multiplying it by the number of infected. So, the rate of infection depends on how the the number of infected at a given time interacts with the fraction of susceptible members. That fraction is a crucial limiting factor in the overall growth rate of infections. (By the by, this relationship between the rate of infections and number of infections makes this a differential equation, a way of relating quantities of things with rates of change of quantities of things.)

The result of this is that in order to confer immunity on a population, it doesn't take every single member of the population being immune, just a certain critical mass. Subtracting from that critical mass (by refusing vaccines, say) means a higher likelihood of an epidemic taking off throughout the population. Vaccines protect individuals, sure, but they also protect populations, providing a way to draw down the pool of susceptible individuals without infecting them (or infecting them in a virulent way).

Imagine a drought-stricken forest, a place littered with deadwood, dried leaves, and dead pine needles. If we toss a match into this mess, it's gonna burn and fast. But what if we had some water and dumped it around our tinderbox forest; every wet spot becomes a place much less likely to burn. You might not have enough water to cover the entire forest, but you can spread enough around such that if a dry patch catches fire, it has less dry real estate with which to spread. If you get enough water around, the dry patches stop mattering, and a fire will just burn itself out.