In Capacity Planning For 1st Responders, we considered the problem of dimensioning a group so that there is at least one member available when needed. Not all service groups, however, are expected to respond immediately to all customers. Most, from supermarket check stands and airport check-in counters to clinics for non-emergency health care, allow some amount of queueing, giving rise to the question of how long the queues become when the servers get busy.
At one point in his latest book, Andy and Me And The Hospital, Pascal Dennis writes that the average number of patients in an emergency room is inversely proportional to the availability of the doctors. The busier the doctors are, the more dramatic the effect. For example, if they go from being busy 98% of the time to 99%, their availability drop by half from 2% to 1%, and the mean number of patients doubles. Conversely, any improvement in emergency room procedures that, to provide the same service, reduces the doctors’ utilization from 99% to 98%, which cuts the mean number of patients — and their mean waiting time — in half.
The rule cited by Pascal Dennis is clearly precious in setting improvement priorities, where it applies. The following paragraphs explore what basic queueing theory tells us about this and practical ways to check it in a real system.
From Production Lines To Queueing Systems
No matter how much they bemoan “variability” in their work, manufacturing professionals operate in a controlled environment where there is less of it than in almost every other human endeavor, and it does not prepare them for the level found in service, where events are driven by independent actions from multiple agents.
Saturation Of An Assembly Line
When work arrives at a work station at fixed intervals one piece at a time and each piece undergoes the same operation in the same, fixed amount of time, the station can be used 100% of the time without accumulating a queue in front. This is what happens inside an assembly line. If it works at a takt time of, say, 58 seconds and you are at station 15, every 58 seconds a workpiece arrives from station 14, you work on it for up to 58 seconds according to a fixed, standard method, and it goes on to station 16.
It is a deterministic system, even if it exists only in approximations. Real assembly lines are subject to small disruptions and variations in the pace at which assemblers work, which make it impossible to specify exactly 58 seconds of work at every station but still, a moving assembly line physically prevents the accumulation of WIP between stations and does deliver a new workpiece at fixed intervals.
In order fulfillment, queueing will still happen with orders upstream from production and with finished goods downstream, but most of it is engineered out of production. In Manufacturing, there is queueing also in peripheral activities like the arrival and departure of employees at the beginning and the end of each shift, food services, and restrooms. Many factories, for example, require employees to stand in line to punch in and out of the facility, and to spend half their lunch breaks walking to a central cafeteria and standing in line.
Why Do Customers Appear To Arrive In Clusters?
A station where an individual performs services of varying durations for customers arriving randomly differs from a station on an assembly line in many ways, but particularly in that the waiting line of customers grows explosively as the station’s utilization approaches 100%. In this context, even the notions of workload and capacity must be envisioned differently than in an assembly line. Instead of workpieces fed to the first station at fixed intervals, customers arrive randomly; instead of fixed process times, you have random service times.
As a teenager, I occasionally spent Saturday afternoons in my sister’s bookstore in Paris, and observed the flow of incoming customers. They seemed to come in clusters. The store abruptly filled up, and gradually emptied out. Then there was dry spell, followed by another crowd. I found this strange, because the customers were entering from a sidewalk where I saw a steady stream of pedestrians, and they came in as individuals who didn’t know each other. I couldn’t see any clustering mechanism at work, and did not understand what was going on until several years later, when I studied queueing theory.
The most basic model, applicable to bookstore customers, is more complicated than the assembly line case in that it requires some High School math. It assumes independent arrivals as a steady average rate. You may, for example, observe 100 customers entering the store in 100 minutes, giving you an arrival rate of 1 customer/minute, or a mean time between customers of 1 minute. Then you assume that the numbers of customers entering in any disjoint time intervals are independent. The math then shows you why your intuition is wrong when it tells you to expect a steady flow of incoming customers. Let’s examine dry spells — that is intervals during which no customer arrives. What is the probability α(t) of a dry spell of duration t?
For the waiting time for the next customer to be t, we must have a dry spell of length t but not of length t+dt, which means that we have a dry spell of length t followed by an arrival between t and t+dt. Because arrivals in [0,t] and [t, t+dt] are independent, the arrival rate is a constant λ, and we have 0 arrivals in [0,t] followed by 1 in [t, t+dt]. Therefore:
It is the cumulative distribution function of the exponential distribution for interarrival times. λ is the arrival rate, and the probability density function for interarrival times is:
When you plot this, you understand why customers appear to arrive in clusters in the absence of any clustering mechanism. Short times between customers occur more often than long ones, and, to the observers, make the customers appear to arrive as a group. The infrequent long intervals, on the other hand, separate these groups and stand out on the timeline, just for being long.
In the literature, this model of customer arrivals is called a Poisson Process with rate λ, and is usually introduced by pulling the math out of a hat, without explaining the assumptions of a constant rate and independent arrivals. The following picture is a simulation of Poisson arrivals at the rate of 1 per minute for two hours:
The shopkeeper who sees customers draining out during the busiest shopping day of the week doesn’t need to fret. It’s the Poisson process at work, and the next gaggle of customers is around the corner.
The assumptions matter because they explain why the Poisson Process fits, for example, the pooled output of many independent series of events, like the failures occurring on 100 different machines and queued for maintenance attention, even when the Poisson process is a poor fit for the failures of any individual machine. In reliability theory, the arrival rate of failures on one machine is called its hazard function, and it is not always a constant.
Likewise, in the absence of a specific disaster or epidemic, each of the, say, 50,000 people served by a hospital has his or her own health issues that may occasionally require a visit to the emergency room. In aggregate, you can expect this population to produce patients arriving independently at a steady average rate, and you can model the patient arrivals as a Poisson Process. Of course, disasters and epidemics do occur, during which this logic no longer applies.
The Poisson Process is not always a match for the more complex queueing behavior of human beings. Instead of joining a long queue, they may balk and walk away. In societies with chronic shortages, as the Soviet Union was, they may instead join a queue without knowing what it is for, in case it is for something valuable. In such situations, the arrival rate of new customers depends on the population already there, and therefore the independence assumption no longer applies.
Some service operations have independent arrivals at rates that vary over time. If you serve food, for example, you can expect peaks around meal times. This can be accommodated in the Poisson Process by using a variable arrival rate λ(t) instead of a constant λ. There are also refinements to model actual clustering, which is what I worked on early in my career, as a mathematician, before I got initiated to factories. The math, however, quickly becomes unable to produce simple formulas, and the most practical way to analyze such processes is by simulating them.
The Service Process
We have discussed how customers arrive but not how they receive service. There are two issues to consider:
- The order in which customers are served. First-In-First-Out (FIFO) is the discipline with the simplest math. FIFO is usually easy to implement with people because it is simple and they perceive it as fair. Even when customers are people, however, there are cases where it doesn’t make sense. In an emergency room, for example, patients who arrive in a life-threatening condition should be treated ahead of those who don’t. In manufacturing, FIFO preserves the process sequence, which is esssential in diagnosing quality problems. Computer operating systems, on the hand, use complex algorithms to allocate slices of processor time to competing processes.
- The time it takes to serve each customer. If the service is a flu shot, there is little variation in duration between customers; if it is checking out of a supermarket, it varies, roughly in proportion to the number of items in each shopper’s cart but it is also influenced by the mode of payment; if it is delivering a checked suitcase to an airport baggage claim carousel, it varies with the position of the suitcase in the delivery sequence,…
The simplest model, extensively covered in the literature, is FIFO with a single server and exponential service times. It means that, while a customer is being served, the time to complete service is between t and t+dt with a constant probability μdt, with μ known as the service rate. Similarly to the arrival process, the probability density function of the service time is:
In real service systems, you often have multiple servers for a single queue. At US airports, for example, a single queue of passengers waits before multiple check-in counters. Supermarket check-out stands could be organized the same way but usually are not, because managers believe that it would relieve the pressure on check stand operators and that their productivity would drop.
As indicated above, you also have many cases where service times just are not exponential. Exponential service times are nonetheless of interest in an analysis of saturation because, as we shall see, the key results apply to more general cases.
Saturation In Service
Let us now consider the complete arrival-service-departure process, with a single server. In the literature, it is often called a birth-and-death process, each arrival being a metaphorical birth and each departure a death. This is an awkward terminology when discussing hospital patients. Even though it takes longer to say, arrival-and-departure process is more appropriate in this context.
Saturation With Poisson Arrivals And Exponential Service Times
For the time being, we’ll assume Poisson arrivals at rate λ and exponential service with completion rate μ. In the literature, this is called an M/M/1 queue; with c>1 servers, an M/M/c queue; with another service time distribution, an M/G/1 queue; with another arrival process and service time distribution, a G/G/1 queue, etc. In all these cases, the process reaches a steady state only when the arrival rate λ is less than the service rate μ.
In all these cases, obviously, if customers arrive faster than they leave, the number of customers grows indefinitely. What we’ll show with the M/M/1 queue is that, as λ approaches μ from below, the mean number of customers in the system grows like 1/(1-ρ) where ρ = λ/μ is the utilization of the system. Then we’ll list the results that show this to be true as well for M/M/c , M/G/1, and G/G/1 queues, with few restrictions on the “general” distributions.
In an M/M/1 queue, Let Pi be the steady-state probability of having i customers in the system, either waiting or being served. Then P0 is the probability that there is no customer, and therefore that the server is idle. Then 1-P0 is the server’s utilization, and we want to know what happens to the queue when this utilization approaches 100%. The Pi are calculated by balancing the in- and outflows of each state. You reach the empty state 0 from the state 1 by having this customer completing service; you leave it when a new customer arrives. Therefore:
For i ≥ 2,
The mean number of customers in the M/M/1 queue is therefore:
The value of this result is that it applies to more general cases. The math to show it is more complex, and I am not showing the derivation of the formulas here, only their application when the queueing system saturates. The best option to go beyond the cases for which there are formulas is to use simulations.
Poisson Arrivals, Exponential Service Times, and Multiple Servers
At airports, to check in or go through border controls, you wait in a single line snaking between barriers, to be served at the first available of a row of counters. The logic is the same when you wait on hold for customer service, or where you take a ticket with a number for service to be provided by any member of a group.
The basic mathematical model for this multiserver FIFO queue is called M/M/c, with c servers, Poisson arrivals, and exponential service times. It does not behave like the single server queue when lightly loaded but does when it is saturated. The utilization is ρ = λ/cμ and the average number of customers in the system becomes the sum of two terms:
- The first one resembles the formula for the M/M/1 queue and accounts for the number of customers in the system when all servers are busy.
- The second one is the average number of customers being served.
The formula is as follows:
where C(c, ρ) , given by the Erlang C formula, is the probability that all servers are busy. Clearly, as the utilization ρ → 1, so does C(c, ρ). C(c,ρ) is not an Excel built-in function, but can be calculated from the Poisson built-in function as follows:
=1/(1 + (1-rho)POISSON(cee-1, ceerho, TRUE)/POISSON(cee, cee*rho,FALSE))
where you put the number of servers cee and the utilization rho in correspondingly named cells. In R, there is a package called queueing that includes a function called C_erlang. The following chart plots L as a function of ρ as the number of servers varies:
It shows how, as utilization approaches 100%, all the curves merge.
As it approaches 100%, the complex formula for the number of customers in the system reduces to:
Where A(c) is only a function of c. Asymptotically, the M/M/c queue saturates like the M/M/1 queue.
The math tells you that an M/M/c queue outperforms a set of c M/M/1 queues to provide the same service, each with an arrival rate of λ/c. and it makes you wonder why grocery supermarkets have a separate line for each check stand. The problem is that many of the assumptions behind the math are not satisfied in actual operations.
With multiple M/M/1 queues, you assume that customers are equally likely to join any, but the lines are in plain view and the customers can choose the shorter ones, and even switch lines midway. Real check-stand operators, also, don’t have the same service rates. Some work faster than others, and these differences are more visible when each stand has its own line than when they share a common one.
Poisson Arrivals, General Service Times, Single Server
For Poisson arrival and any service time distribution that has a mean and standard deviation, an M/G/1 queue, we have the Pollaczek-Khinchin formula:
where ρ = λ/μ, 1/μ is the mean of the service time S, and Var(S) its variance. Then, when ρ → 1,
where A(S) is only a function of the service time distribution, and the queue also saturates like the M/M/1 queue.
General Arrival Process, General Service Time Distribution, Single Server
In 1961, John Kingman, in the UK, published a more general formula, for the expected waiting time in a G/G/1 queue, where we have a different distribution of interarrival times, of which we only require that is has a positive mean and a standard deviation. Then the mean waiting time Wq is:
where ca and cs are the coefficients of variation, of interarrival times and service times, defined as the ratios of standard deviation to mean. With Poisson arrivals at rate λ, the interarrival times are exponentially distributed, with a mean and standard deviation both equal to 1/λ. Therefore ca = 1. With a general distribution of interarrival times, ca can be anything.
To go fromWq to L, we add ρ for the mean number of customers being processed by the server, and multiply by λ to apply Little’s Law.
If ca is bounded for the arrival rate λ in a neighborhood [μ-ε, μ], then we can bracket L as follows:
where A(S) and B(S) are functions only of the service times. Again, as ρ → 1, L rises like 1/(1-ρ).
More General Situations
The complexity of real queueing systems usually exceeds the best mathematicians’ ability to derive simple formulas, but the performance of such systems can be analyzed through simulations.
Pascal Dennis’s assumption that, in a hospital emergency room, as the doctor’s utilization ρ approaches 100%, the number of patients rises like 1/(1-ρ) is a reasonable starting point for steady-state operations, understanding that the dynamic is different in times of epidemics or natural disasters. And you may want to verify it through observations.
One consequence alluded to earlier is that the waiting lines in a saturated system can be massively cut by increasing service rates by just a few percentage points, with diminishing returns as the system becomes less saturated. Going from 99% to 98% in server utilization can cut your mean waiting line in half, but going further from 98% to 95% gets you less.
Once you are out of the saturation zone where you have a clear strategy to cut the mean waiting time in half, you can switch to other forms of improvement, in customer sequencing or in the physical organization of the waiting system. If your system’s customers are people, for example, you may give priority to some of them, and you may organize the waiting so that they don’t personally have to stand in line for hours outside, for example by using numbered tickets.
Generally, waiting is regarded as a waste of time, particularly when it requires standing in line. It has been argued that, in the old Soviet Union, waiting was the main economic activity, because it was how citizens got things and their frustration with constant shortages was a factor in the collapse of the country.
In the US, oddly, some organizations have managed to make people volunteer or even pay to stand in line. For over 60 years, Disneyland has sold tickets for the privilege of standing in line for hours at rides that last minutes, and Apple product launches under Steve Jobs were occasions for fans to wait overnight at stores.