How do I analyze historical consumption for 13,000 items?

Supply chain consultant Hadas Gur asked the following question:

I have data of demands for 13000 SKUs (consumptions from the last 5 years). 6000 of the observations are zeros.  I can’t recognize the distribution of the data . I have tried the q-q plot to find a match to any known distribution. What can I do in this case if I want to find the reorder point? Is it ok to use the reorder point formula which is in your post “Safety Stocks : Beware of Formulas” even though the distribution is not normal?

You do not give a context. Are those SKUs components supplied to a manufacturing company or retail items on supermarket shelves? The demand patterns may be radically different. In retail, for example, the demand for milk is the sum of the quantities bought by a large number of individual consumers acting independently, and the normal distribution is a likely fit. On the other hand, if you are supplying a model-specific part to a car manufacturer, it is unlikely to fit.

Do not try to apply the same approach to all 13,000 SKUs! For example, reorder point makes no senses for the 6,000 items that have had no demand in the past 5 years. You would want to investigate whether they should still be in the catalog and, if so, they are strangers and you need to organize to make or buy them when an order arrives.

For the others, I would suggest you explore the data rather than focus on fitting a distribution, starting with a Runner/Repeater/Stranger analysis. Then, starting with runners, investigate trends and seasonal variations. For repeaters, I would investigate ways to group them into families that make sense for what you are trying to do.

Do not use only the data. In order to understand what is possible, you need to visit the warehouses or distribution centers and understand how physical distribution distribution is organized, and the people involved.

Then consider a range of approaches for different items and item families, including just-in-sequence, Kanban, two-bin, reorder point, vendor-managed inventory, consignment, etc.  Examine how these approaches would have performed with the consumption pattern of the last 5 years. You can also simulate future demand.

Is the Kanban system to ensure availability of materials or to reduce inventory?

Pranay Nikam, from VCT Consulting India, asked the following question:

“I have designed and implemented the Kanban System at various type of industries. The challenge I face now is not that of explaining people how the system is designed or how it works. But rather clearing the misconception/misunderstandings key industry people have about Kanban.

My understanding of a Kanban System is ‘A Consumption based replenishment system’ with Multiple Re-Order Point (multiple Bins) as opposed to the traditional Two Bin System. In simpler words you keep enough stock to cover for the total lead time and add a buffer for demand variation and supply failures. And keep replenishing the stock as and when you consumes. The replenishment can be through fresh production or withdrawal from Warehouses or procurement from supplier.

Prime objective of the Kanban System is material availability to enable High Mix and low volume production; ultimately to support production levelling instead of running huge batches.

However, some Lean Consultants propagate Kanban as a inventory reduction tool and nothing more than a material scheduling software that can be configured in any ERP Systems.

I would be happy be receive your comments on the two different perspectives.”

The Kanban system has many variants, discussed in Chapters 10 to 13 of Lean Logistics. All these variants, however, have the following characteristics in common:

  1. They implicitly assume the demand for an item in the immediate future to match the recent past. It is a naive forecast, but hard to beat on intervals that are negligible with respect to what Charlie Fine calls the clockspeed of the business. And the fluctuations are smoothed by leveling/heijunka.
  2. They use some form of tokens to signal demand. Whether these tokens are cards or electronic messages, they can be detached from bins and parts and processed separately, in ways that are not possible, for example, in the two-bin system.
  3. There is a fixed number of tokens in circulation for each item, which is a key control mechanism for the supply of this item.
  4. The protocols for handling these tokens provide unambiguous directions on what should be done. No human judgement call is required to decide which item to move or produce. There are variations where that is not the case, like the French Kanban, which, for this reason, I don’t consider genuine.

The Kanban system  is not just a multiple-bin system, because bins are not used as pull signals. The Kanbans are pulled from bins when you start withdrawing parts from it, which you couldn’t do if the bin itself were used as a signal. If the signals are cards, you can organize them in post-office slots or on boards, which you also couldn’t do with bins. And, of course, you can do much more with electronic signals, which does not necessarily mean you should.

Your description of Kanban omits the goal of keeping inventory as low as you can without causing shortages, and experimenting with the numbers of Kanbans in circulation to test where the limit is, which makes it a tool to drive improvement.

Kanbans work for items consumed in quantities that have small fluctuations around a mean, which means medium-volume/medium mix rather than low-volume/high mix. You use other methods for different demand patterns, like reorder point for bulk supplies, consignment for standard nuts, bolts and washers, or just-in-sequence for option-specific large items… In low-volume/high-mix production you have many items that you cannot afford to keep around and only order from your supplier when you have an order from your customer; it isn’t the way the Kanban system works.

You can do many things with ERP systems but, historically, they have been more effective in managing purchase orders with suppliers than in directing shop floor operations. If you have an ERP system with accurate, detailed data about your shop floor, you can, in principle apply any algorithm you want to produce a schedule. Most ERP systems, however, do not even have structures in their databases to model the behavior of production equipment at a sufficient level of detail, and are not capable of producing actionable schedules. They print recommendations, and the final decision on the work that is actually done is a judgement call by the supervisor, or even sometimes the operator. Within its range of applicability, the Kanban system avoids this with simple rules, by focusing on what is actually observable and controllable at the local level.

So, I suppose the answer to your question is that the Kanban system’s immediate purpose in daily operations  is to assure the availability of materials while reducing inventory, with the longer-term purpose of driving improvement. Pursuing either of these goals at the expense of the other would be easier but not helpful to the business.

Safety Stocks: More about the formula

In a previous post on 2/12/2012, I warned against the blind use of formulas in setting safety stock levels. Since then, it has been the single most popular post in this blog, and commands as many page views today as when it first came out. Among the many comments, I noticed that several readers, when looking at the formula, were disturbed that three of the four parameters under the radical are squared and the other one isn’t, to the point that they assume it to be a mistake. I have even seen an attempt on Wikipedia to “correct that mistake.”

I was myself puzzled by it when I first saw the formula, but it’s no mistake.  The problem is that most references, including Wikipedia,  just provide the formula without any proof or even explanation. The authors just assume that the eyes of inventory managers would glaze over at the hint of any math. If you are willing to take my word that it is mathematically valid, you can skip the math. You don’t have to take my word for it, but then, to settle the discussion, there no alternative to digging into the math.

A side effect of working out the math behind a formula is that it makes you think harder about the assumptions behind it, and therefore its range of applicability, which we do after the proof. If you don’t need the proof, please skip to that section.

Math prerequisites

As math goes, it is not complicated. It only requires a basic understanding of expected value, variance, and standard deviation, as taught in an introductory course on probability.

In this context, those who have forgotten these concepts can think of them as follows:

  • The expected value E(X) of a random variable X can be viewed, in the broadest sense, as the average of the values it can take, weighted by the probability of each value. It is linear, meaning that, for any two random variables X and Y that have expected values,

E[X+Y] = E[X]+E[Y]

and, for any number a,

E[a\times X]= a \times E[X]

  • Its variance is the expected value of the square of the deviation of individual values of X from its expected value E(X):

Var(X) = E[X-E(X)]^{2}= E[X^{2} -E(X)^{2}]

Variances are additive, but only for uncorrelated variables X and Y that have variances. If

E[[X-E(X)] \times [Y-E(Y)]]= 0


Var(X+Y) = Var(X)+Var(Y)

  • Its standard deviation is

\sigma = \sqrt{Var[X]}

Proof of the Safety Stock Formula

Fasten your seat belts. Here we go:

As stated in the previous post, the formula is:


  • S is the safety stock you need.
  • C  is a coefficient set to guarantee that the probability of a stockout is small enough.
  • The other factor, under the radical sign, is the corresponding standard deviation.
  • μL and σL are the mean and standard deviations of the time between deliveries.
  • μD and σD are the mean and standard deviation rates for the demand.

  \sqrt{\mu{_{L}^{}}\times\sigma_{D}^{2}+\mu_{D}^{2} \times \sigma_{L}^{2}} is the standard deviation of the item quantity consumed between deliveries, considering that the time between deliveries varies.

μD and  \sigma_{D}^{2} are the mean and variance of the demand per unit time, so that the demand for a period of length T has a mean of \mu_{D} \times T , a variance of  \sigma_{D}^{2} \times T, and therefore a standard deviation of \sigma_{D} \times \sqrt{T}. See below a discussion of the implications of this assumption.

Note that the assumptions are only that these means and variances exist. At this stage, we don’t have to assume more, and particularly not that times between deliveries and demand follow a particular distribution.

If D(T) is the demand during an interval of duration T, since:

we have:

If we now allow T to vary, around mean μL with, standard deviation σL , we have:

and therefore:


That’s how the variance ends up linear in one parameter and quadratic in the other three!


 \sigma\left [ D \right ]= \sqrt{\mu _{L}\times\sigma_{D}^{2} + \mu _{D}^{2} \times \sigma_{L}^{2} }


Note that all of the above argument only requires the means and standard deviations to exist. There is no assumption to this point that the demand or the lead time follow a normal distribution. However, the calculation of the multiplier C used to calculate an upper bound for the demand in a period, is based on the assumption that the demand between deliveries is normally distributed.


The assumption that the variance of demand in a period of length T is \sigma_{D}^{2} \times T implies that it is additive, because if  T = T_{1} + T_{2}, then \sigma_{D}^{2} \times T = \sigma_{D}^{2} \times T_{1} + \sigma_{D}^{2} \times T_{2}.

But this is only true if the demands in periods T_{1} and T_{2} are uncorrelated. For a hot dog stand working during lunch time, this is reasonable: the demands in the intervals between 12:20 and 12:30, and between 12:30 and 12:40 are from different passers by, who make their lunch choices independently.

On the other hand, in a factory, if you make a product in white on day shift and in black on swing shift every day,  then the shift demand for white parts will not meet the assumptions. Within a day, it won’t be proportional to the length of the interval you are considering, and the variances won’t add up. Between days, the assumptions may apply.

More generally, the time periods you are considering must be long with respect to the detailed scheduling decisions you make. If you cycle through your products in a repeating sequence, you have an “Every-Part-Every” interval (EPEI), meaning, for example, that, if your EPEI is 1 week, you have one production run of every product every week.

In a warehouse, product-specific items don’t need replenishment lead times below the EPEI. If you are using an item once a week, you don’t need it delivered twice a day. You may instead receive it once a week, every other week, every three weeks, etc. And the weekly consumption will fluctuate with the size of the production run and with quality losses. Therefore, it is reasonable to assume that its variance will be \sigma_{D}^{2} \times T where T is a multiple of the EPEI, and it can be confirmed through historical data.

You can have replenishment lead times that are less than the EPEI for materials used in multiple products. For example, you could have daily deliveries of a resin used to make hundreds of different injection-molded parts with an EPEI of one week. In this case, the model may be applied to shorter lead times, subject of course to validation from historical data.

Safety Stocks: Beware of Formulas

A formula you find in a book or learn in school is always tempting. It is a “standard.” If you follow it, others are less likely to challenge your results. These results, however, may be worthless unless you take a few precautions. Following are a few guidelines:

  1. Don’t use a formula you know nothing about. Its validity depends on assumptions that may or may not be satisfied. You don’t need to know how to prove the formula, but you need to know its range of applicability.
  2. Examine your data. Don’t just assume they meet the requirements. Examine their summary stats, check for the presence of outliers, generate histograms, scatter plots, time series, etc.
  3. Don’t make up missing data. If you are missing the data you need to estimate a parameter, find what you can infer about the situation from other parameters, by other methods. Do not plug in arbitrary values.
  4. Make your Excel formulas less prone to error by using named ranges rather than cell coordinates. If a formula is even slightly complicated, referring to variables by names like “mean” or “sigma” makes formulas easier to proof-read than with names like “AJ” or “AK.”

1. The safety stock formula for the reorder point method

Safety stock is a case in point. The literature gives you a formula that is supposed to allow you to set up reorder point loops with just the minimum amount of safety needed to prevent shortages under certain conditions of variability in both your consumption rate and your replenishment lead time. It is a beautiful application of 19th century mathematics but I have never seen it successfully used in manufacturing.

Let us look more closely at what it is so you can judge whether you would want to rely on it. Figure 1 shows you a model of the stock over time when you use the Reorder Point method and both consumption and replenishment lead time vary according to a normal distribution. The amount in stock when the reorder point is crossed should be just sufficient to cover your needs until the replenishment arrives. But since both replenishment lead time and demand vary, you need some safety stock to protect against shortages.

Figure 1. The reorder point inventory model

If your demand is the sum of small quantities from a large number of agents, such as sugar purchases by retail customers in a supermarket, then the demand model makes sense. In  a manufacturing context, there are many situations in which it doesn’t. If you produce in batches, then the demand for a component item will be lumpy: it will be either the quantity required for a batch or nothing. If you use heijunka, it will be so close to constant that you don’t need to worry about its variations.

What about replenishment lead times? If in-plant transportation is by forklifts dispatched like taxis, replenishment lead times cannot be  consistent. On the other hand, if it takes the form of periodic milk runs, then replenishment lead times are fixed at the milk run period or small multiples of it. With external suppliers, the replenishment lead times are much longer, and cannot be controlled as tightly as within the plant, and a safety stock is usually needed.

Let us assume that all the conditions shown in Figure 1 are met. Then there is a formula for calculating safety stock that you can find on Wikipedia or in David Simchi-Levy’s Designing and Managing the Supply Chain (pp. 53-54).  Remember that it is only valid for the Reorder Point method and that it is based on standard deviations of demand and lead time that are not accessible for future operations and rarely easy to estimate on past operations. The formula is as follows:


  • S is the safety stock you need.
  • C  is a coefficient set to guarantee that the probability of a stockout is small enough. You can think of it a number of standard deviations above the mean item demand needed to protect you against shortages. In terms of Excel built-in functions, C is given by:

C = NORMSINV(Service level)

Service levelC
90.0% 1.28
95.0% 1.64
99.0% 2.33
99.9% 3.09
  • The other factor, under the radical sign, is the corresponding standard deviation.
  • μL and σL are the mean and standard deviations of the lead time.
  • μD and σD are the mean and standard deviation of the demand per unit time, so that the demand for a period of length T has a mean of μD xT and a standard deviation of σDx √T

2. Case study: Misapplication of the safety stock formula

This formula is occasionally discussed in Manufacturing or Supply Chain Management discussion groups, but I have only ever seen one attempt to use it,  and it was a failure. It was for the supply of components to a factory, and 14 monthly values were available for demand, but only an average for lead times.

The first problem was the distribution of the demand, for which 14 monthly values were available. This is too few for a histogram, but you could plot their cumulative distribution and compare it with that of a normal distribution with the same mean and standard deviation, as in Figure 2. You can tell visually that the actual distribution is much more concentrated in the center than the normal model, which is anything but an obvious fit.  Ignoring such objections, the analyst proceeded to generate a spreadsheet.

Figure 2. Actual versus normal cumulative distribution

The second problem is that he entered the formula incorrectly, which was not easy to see, because of the way it was written in Excel.  The formula in the spreadsheet was as follows:


then, looking at the spreadsheet columns, you found that they were used as follows:

  • AJ  for Standard Deviation of Daily Demand, and
  • AL for Average Replenishment time.

And therefore the first term under the square root sign was σDL2 instead of μLxσD2.

The third problem was that the formula requires estimates of standard deviations for both consumption and replenishment lead times, but no data was available on the latter. To make the formula produce numbers, the standard deviations of replenishment lead times was arbitrarily assumed to be 20% of the average.

For all of these reasons, the calculated safety stock values made no sense, but nobody noticed. They caused no shortage, and the “scientific” formula proved that they were the minimum prudent level to maintain.

3. Sizing safety stocks in practice

There is no universal formula to determine an optimal size of safety stocks. What can often be done is to simulate the operation of a particular replenishment loop under specified rules. For a simulation of a Kanban loop using Excel, see Lean Logistics, pp. 208-213.

No calculation or simulation, however, is a substitute for keeping an eye on what actually happens on the shop floor during production. One approach is to separate the safety stock physically from the regular, operational stock and monitor how often you have to dig into it. If, say, six months go by without you ever needing it, you are probably keeping too much and you cut it in half. With a Kanban loop, you tentatively remove a card from circulation. If no shortage results, then the card was unnecessary. If a shortage occurs, you return the card and look for an opportunity to improve the process so that the card can be removed.