This is about the motivation for the R chart and its math. We shouldn’t ask manufacturing professionals to apply a technical tool without explaining its purpose and its theory.
However, without doing either, the SPC literature promotes the use of the R chart to detect changes in the fluctuations of measured variables, along with \bar{X} charts for changes in their means. The books provide recipes for using these charts, but no explanation.
Harold Dodge introduced the R chart 100 years ago to overcome shop floor pushback against calculating sample standard deviations with paper, pencil, and slide rules. While easier to understand and to use daily, sample ranges are mathematically more complex and more sensitive to extreme values than standard deviations.
Like all control charts, the R chart uses limits calculated for the Gaussian distribution. As no simple formula is available for the R chart, setting control limits for it requires numerical approximations that must have consumed months for human computers in 1924.
Today, you can replicate them instantaneously with software. These calculations reveal that the \pm 3\sigma limits in the books for the range chart do not actually encompass the 99.73% of the distribution that they do in \bar{X} charts.
The R chart was an ingenious workaround to technical and human constraints of the 1920s that no longer exist. Today, rather than blindly applying these tools, we must draw inspiration from their inventors and develop solutions to meet the process capability challenges we are actually facing.
The idea of using the within-sample range of a numeric variable instead of its standard deviation is commonly attributed to Walter Shewhart, but there is no discussion of it in his books. On the other hand, the page about Harold Dodge on the ASQ website, quotes him as saying:
Harold Dodge
“In 1924, our work in cooperation with shop engineers was influenced heavily by great pressures to save money and to make the quality control methods simple and easy to use.
Initially, the basic procedures for variables called for samples of four, with one chart for the average, x̄ , and another for the standard deviation, σ. Shop reaction was prompt against anything as complicated as computing the standard deviation.
After some study we proposed the use of the range, R. On top of that we proposed shop use of samples of five instead of four; it is easier to divide by five than by four. These simplifying steps quickly became the basis for shop practice.”
So, Shewhart’s first disciples replaced his \bar{X}-\sigma charts with the \bar{X}-R charts, not because the range is a better statistic but to defuse a mutiny.
Now that either one is a click away, it is hard to imagine how engineers worked 100 years ago. With paper and pencil, the range of 5 or 10 numbers was easier to compute than the square root of a sum of squares, and it was also easier to understand.
Sensitivity to Extreme Values
In fact, being based entirely on the extreme values within the sample, the range is obviously more sensitive to outliers than the sample standard deviation.
About the Math of Ranges
From the standpoint of inference, however, the math of ranges is more complex. There is, in fact, no known simple formula for the distribution of a sample range, except for uniform variables, and as approximations for large samples.
While many SPC authors keep asserting that measurements do not need a Gaussian (also known as “Normal”) distribution for Control Charts to work, the fact is that the math used to set control limits is based on the assumption that, in a state of statistical control, they are the sum of a constant central value and a Gaussian white noise.
As a consequence, to base limits on sample ranges, you need a model for the distribution of the range of a sample of independent Gaussian variables with the same mean and standard deviation.
The Recipe for Range Charts
The SPC literature provides a recipe for the range chart but omits the math that justifies it. In your process capability study, you start with a sequence of measurements that is representative of your process in the absence of assignable causes of disruption.
You first split it into a sequence k rational subgroups of size n. That’s kn data points x_{ij} with i=1,\dots, k and j=1,\dots, n. You compute the average and the range of the measurements in each subgroup, i = 1,\dots, k:
\bar{\bar{x}} = \frac{1}{k}\sum_{i=1}^{k}\bar{x}_i and
\bar{r} = \frac{1}{k}\sum_{i=1}^{k}r_i
Then you look up coefficients in the following table, which vary with the subgroup size n:
n
A2
D3
D4
2
1.88
0
3.267
3
1.023
0
2.575
4
0.729
0
2.282
5
0.577
0
2.115
6
0.483
0
2.004
7
0.419
0.076
1.924
8
0.373
0.136
1.864
9
0.337
0.184
1.816
10
0.308
0.223
1.777
Using these coefficients, you set up the \bar{X} chart with the following parameters:
\text{Center line} = \bar{\bar{x}}
\text{Upper Control Limit} = \bar{\bar{x}} + A2\times \bar{r}
\text{Lower Control Limit} = \bar{\bar{x}} - A2\times \bar{r}
And the R chart:
\text{Center line} = \bar{r}
\text{Upper Control Limit} = D4\times \bar{r}
\text{Lower Control Limit} = D3\times \bar{r}
Then you plot the average and range of data on new samples against these limits, for evidence of special causes of variation.
This is the cookbook version. It provides no rationale for the values of A2, D3, or D4, and none of the assumptions about the x_{ij} that are needed for them to be valid. Let us fill this gap.
The Distribution of Sample Ranges
The x_{ij}, j = 1,\dots, n in any one of the k subgroups are a sequence of measurements, but we will assume that their order makes no difference and that they are instances of a sample \mathbf{X} = \left ( X_1, \dots, X_n \right ) of n independent and identically distributed (i.i.d.) random variables, with a probability distribution function (p.d.f)f and cumulative distribution function (c.d.f.) F. We also assume that f is continuous and that this distribution has a mean \mu and a standard deviation \sigma.
We are going to study the distribution of the random variable
Because f is continuous, we can neglect the probability that two or more of the X_i take the same value, and the probability that one of the n variables is between x and x+dx is
The general case doesn’t simplify any further. We get a simple formula for the uniform distribution and approximations for large samples but, otherwise, even for Gaussian variables, we have to use numerical methods.
Noting that nothing is changed in this formula if you change the variable from x to x+\mu, it is clear that the range distribution does not depend on the mean of the X_i. There is no loss of generality is assuming their mean to be zero.
On the other hand, the range distribution is related to the standard deviation \sigma of the X_i and quantifying this relationship at least for Gaussian X_i is our purpose in studying the range distribution.
Sample Range for a Uniform Distribution
In this case, f= \mathbf{I}_{\left [ 0,1 \right ]} is 1 over the interval between 0 and 1, and 0 everywhere else and
F\left ( x \right )=\left\{\begin{array}{l}0 \,\text{if}\, x <0\\x \,\text{if}\, 0 \leq x \leq 1\\1 \,\text{if}\, x > 1\\\end{array}\right.
The range distribution is tractable for the uniform distribution, but this distribution only occurs with measured process variables in rare cases, for example where the distribution is much wider than the spec and production involves binning.
The case of Gaussians
The measurements on manufacturing workpieces are not uniformly distributed, and their rational subgroups, of 5 to 10 points, are small samples. Consequently, the above calculations don’t apply. The control chart model is that, in a state of statistical control, the measurements are the sum of a constant and Gaussian white noise.
How the Range Distribution Scales
We know that the range distribution is independent of the mean, in general, but we need to explore its relationship with the standard deviation for a sample of centered Gaussians ~N(0, \sigma). For any of the X_i, i = 1,\dots, n, the p.d.f. Is
f\left ( x \right )= \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{1}{2}\left ( \frac{x}{\sigma} \right )^2}
which is of the form f\left ( x \right )= \frac{1}{\sigma}\phi\left ( \frac{x}{\sigma} \right ) where \phi\left ( x \right )= \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} is the p.d.f. for N(0,1) and the c.d.f. Is of the form:
This means that the distribution of the range scales exactly like the distribution of the X_i, and that we only need to determine the distribution of R for \sigma =1.
For other values of \sigma, the range distribution scales like the Gaussian itself. From the formula, we can see that, for n>2, h_1(0) =0 , but it’s not so for n=2.
In fact, in that case, R =|X_1 -X_2|, which is the absolute value of the difference between two independent N(0,1 variables. The difference X_1 - X_2 is Gaussian N(0, \sqrt{2}, and its absolute value, a “folded Gaussian,” as discussed about the moving range chart in a post about the XmR chart.
Bounds and Special Values
For all n,
h_1\left ( r \right ) \leq n\left ( n-1 \right )\int_{-\infty}^{+\infty}\phi\left ( x + r \right )\phi(x)dx = n\left ( n-1 \right )\frac{1}{2\sqrt{\pi}}e^{-\frac{r^2}{4}}
because the integrand is the p.d.f. at r of the sum of two Gaussians N(0,1). This cap tells us how far it makes sense to calculate h_1(r). For example, for n=30 and r = 7, it tells us that h_1(r) \leq 0.004.
\sigma_R^2 = Var\left ( R \right ) = \sigma^2\int_{0}^{\infty}s^2h_1\left (s \right )ds -\left [ E\left ( R \right ) \right ]^2 = \sigma^2\left [\int_{0}^{\infty}s^2h_1\left (s \right )ds - d_2^2\right ]
and the standard deviation is
\sigma_R = \sigma\sqrt{\int_{0}^{\infty}s^2h_1\left (s \right )ds - d_2^2} = \sigma \times d_3 = \frac{d_3}{d_2}E\left ( R \right )
The standard deviation \sigma_R of R is proportional to \sigma and to its value d_3 for \sigma =1
Numerical Approximations
The plots and tables below use the following R tools:
The built-in functions in R for the p.d.f. and c.d.f. of the Gaussian distribution.
The integrate() function for numerical integration.
The approxfun() function to interpolate between points.
The ggplot2 package for charting.
Range Distributions
The plots of densities and cumulative distributions of the range for samples of independent Gaussian variables of increasing sizes are as follows:
These densities are definitely not Gaussian for n=2 and n=3 and still heavily skewed for n=5. This matters because range charts are usually drawn for small samples.
As n rises, the distribution looks more and more bell-shaped and less skewed, which may tempt us, for large n, to use the Gaussian distribution as a model, with mean E\left ( R \right ) and standard deviation \sigma_R. For \sigma =1 and n=30,
E\left ( R \right ) = d_2 = 4.0830
and
\sigma_R = \frac{d_3}{d_2}E\left ( R \right )= 0.6969
Let’s plot both the range distribution for n=30 and the Gaussian N\left (4.0830, 0.6969\right ):
Overall, the two distributions look close. However, it is the tails that matter most in the setting of control limits. In this case, the upper tail matters most, because it’s used to generate alarms on increases in the range from assignable causes. As we’ll see below, the SPC literature sets control limits for range charts based on this Gaussian approximation for all n. As a the result, that the \pm 3\sigma limits do not encompass 99.73% of the distribution when under statistical control.
Range-based Control Limits for the \bar{X} chart
For a sample size of n, the control charts on \bar{X} are set at \mu \pm 3\frac{\sigma}{\sqrt{n}}
Since \sigma = \frac{E\left ( R \right )}{d_2}, in terms of E\left ( R \right ), these limits are at \mu \pm \frac{3}{\sqrt{n}d_2} E\left ( R \right )
Therefore \mathbf{A2} = \frac{3}{\sqrt{n}d_2}
is as in the following table:
n
1/d_2
A2
2
0.88623
1.880
3
0.59081
1.020
5
0.42994
0.577
10
0.32494
0.308
20
0.26780
0.180
30
0.24491
0.134
This justifies the use of the average range \bar{r} to set control limits on the \bar{X} chart. For the range chart, we need to start with the method used for setting limits on the \sigma chart of within-sample standard deviations.
Control Limits for the Range Chart in the SPC literature
The SPC literature sets the control limits for the range chart at E\left ( R \right ) \pm 3\sigma_R, which works out to
UCL = \left ( 1 + 3\frac{d_3}{d_2} \right )E\left ( R \right ) = D4\times E\left ( R \right )
Why use \pm 3\sigma_R limits on the range chart? For the \bar{X} chart. they apply to the extent the sample averages are Gaussian. Then, the values will fall between the limits with a probability of 99.73%, unless an assignable cause has shifted the mean.
Meaning of the Threshold
It is an arbitrary threshold that Walter Shewhart chose in 1924 because it was “about the magnitude customarily used in engineering practice.” (Statistical Method from the Viewpoint of Quality Control, p.62). Even though Shewhart’s first book, in 1931, was called “Economic Control of Quality,” it contains no reference to a loss function to be minimized or an expected utility to be maximized. It’s understandable, given that Von Neumann & Morgenstern first introduced these concepts in statistical decision theory in 1947. The question here, however, is not whether setting limits to encompass 99.73% of the distribution is, in some sense, economic, but whether the limits in the recipe actually do it.
Negative Lower Limits for Ranges?
Ranges, by definition, are always positive, but there is nothing to prevent the -3\sigma limit from being negative, and we have to replace these values with 0 for the limits to make sense. Logically, this isn’t satisfying.
Asymmetry of Low and High Ranges
With \bar{X}, excursions on either side of the \left [ LCL, UCL \right ] are alarms about the process being out of control. In the R chart, there is no such symmetry. Going above the UCL is indeed an alarm, but going below the LCL, if validated, means that the variability of the process is reduced, which is a cause for celebration, not alarm.
p-Values of Control Limits for Ranges
In the words of today’s statisticians, it means that the test of whether an instance of \bar{X} is between the control limits has a p-value of .0027. Let’s zero in on \sigma =1. Then UCL = D4\times d2 and LCL = D3\times d2 . The p-values are therefore: p = 1-\left [ H_1\left ( UCL \right ) - H_1\left (LCL \right ) \right ], which we can calculate for various values of n:
n
d2
D3
D4
LCL
UCL
p-value
p-value/.0027
2
1.128375
0
3.267
0
3.686
0.0091
337%
3
1.692548
0
2.575
0
4.358
0.0058
214%
5
2.325927
0
2.115
0
4.919
0.0046
170%
10
3.077423
0.223
1.777
0.686
5.469
0.0044
163%
20
3.734134
0.414
1.586
1.546
5.922
0.0047
174%
30
4.082959
0.488
1.512
1.992
6.173
0.0051
189%
Alternative Approach to Range Control Limits
Since we know the c.d.f. H_1 numerically, we can use it to set limits so that, under statistical control, the range lies between the limits 99.73%, as for the other kinds of charts. For both tails to be equal in probability at 0.135% and \sigma =1, we can set the limits at UCL = H_1^{-1} \left (.99865\right) and LCL =H_1^{-1} \left (.00135\right)
n
H_1^{-1}\left ( .00135 \right )
H_1^{-1}\left ( .99865 \right )
2
0.0024
4.5338
3
0.0700
4.9514
5
0.3900
5.3783
10
1.1000
5.8799
20
1.900
6.3571
30
2.400
6.6851
The following picture illustrates this process for n=5:
Estimating the Standard Deviation
The literature commonly fails to distinguish the standard deviation \sigma . The first is a parameter of a random variable X. The second is its estimate s from a sample x_i, i = 1,\dots, n:
\sigma = \sqrt{E \left [ X-E\left ( X \right ) \right ]^2} and
s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}E \left [ x_i- \bar{x} \right ]^2}
A random variable does not always have a standard deviation. When it does, s^2 is an unbiased estimator of \sigma^2. It does not mean that s is an unbiased estimator of \sigma. In fact, Shewhard introduced a factor now called c_4 to correct this bias for \bar{X}-s charts.
But, for Gaussian variables,
\sigma= \frac{1}{d_2}\times E \left ( R \right ) = \frac{1}{d_2}\times E \left ( \bar{r} \right )
which means that you can use the average range to get an unbiased estimate of \sigma.
Conclusions
We now know who developed the range chart. We also know what problem they were trying to solve, and the underlying theory behind their solution. The next question is its relevance 100 years later. In 2025, we can admire the ingenuity with which Shewhart, Dodge, and others worked around the limitations of their environment. But we are not facing the same technical or human constraints.
Shewhart, Walter A. (1939). Statistical Method from the Viewpoint of Quality Control (Dover Books on Mathematics) (p. 62). Kindle Edition.
Woodall, W. H. and Montgomery, D. C. (2000-01), Using Ranges to Estimate Variability, Quality Engineering, Vol. 13, No. 2, pp. 211-217.
Von Neumann, J., Morgenstern, O. (1947). Theory of Games and Economic Behavior: 60th Anniversary Commemorative Edition. United Kingdom: Princeton University Press.
Nov 7 2025
The Lowdown on the Range Chart
This is about the motivation for the R chart and its math. We shouldn’t ask manufacturing professionals to apply a technical tool without explaining its purpose and its theory.
However, without doing either, the SPC literature promotes the use of the R chart to detect changes in the fluctuations of measured variables, along with \bar{X} charts for changes in their means. The books provide recipes for using these charts, but no explanation.
Harold Dodge introduced the R chart 100 years ago to overcome shop floor pushback against calculating sample standard deviations with paper, pencil, and slide rules. While easier to understand and to use daily, sample ranges are mathematically more complex and more sensitive to extreme values than standard deviations.
Like all control charts, the R chart uses limits calculated for the Gaussian distribution. As no simple formula is available for the R chart, setting control limits for it requires numerical approximations that must have consumed months for human computers in 1924.
Today, you can replicate them instantaneously with software. These calculations reveal that the \pm 3\sigma limits in the books for the range chart do not actually encompass the 99.73% of the distribution that they do in \bar{X} charts.
The R chart was an ingenious workaround to technical and human constraints of the 1920s that no longer exist. Today, rather than blindly applying these tools, we must draw inspiration from their inventors and develop solutions to meet the process capability challenges we are actually facing.
Contents
Why the Range Chart?
The idea of using the within-sample range of a numeric variable instead of its standard deviation is commonly attributed to Walter Shewhart, but there is no discussion of it in his books. On the other hand, the page about Harold Dodge on the ASQ website, quotes him as saying:
“In 1924, our work in cooperation with shop engineers was influenced heavily by great pressures to save money and to make the quality control methods simple and easy to use.
Initially, the basic procedures for variables called for samples of four, with one chart for the average, x̄ , and another for the standard deviation, σ. Shop reaction was prompt against anything as complicated as computing the standard deviation.
After some study we proposed the use of the range, R. On top of that we proposed shop use of samples of five instead of four; it is easier to divide by five than by four. These simplifying steps quickly became the basis for shop practice.”
So, Shewhart’s first disciples replaced his \bar{X}-\sigma charts with the \bar{X}-R charts, not because the range is a better statistic but to defuse a mutiny.
Now that either one is a click away, it is hard to imagine how engineers worked 100 years ago. With paper and pencil, the range of 5 or 10 numbers was easier to compute than the square root of a sum of squares, and it was also easier to understand.
Sensitivity to Extreme Values
In fact, being based entirely on the extreme values within the sample, the range is obviously more sensitive to outliers than the sample standard deviation.
About the Math of Ranges
From the standpoint of inference, however, the math of ranges is more complex. There is, in fact, no known simple formula for the distribution of a sample range, except for uniform variables, and as approximations for large samples.
While many SPC authors keep asserting that measurements do not need a Gaussian (also known as “Normal”) distribution for Control Charts to work, the fact is that the math used to set control limits is based on the assumption that, in a state of statistical control, they are the sum of a constant central value and a Gaussian white noise.
As a consequence, to base limits on sample ranges, you need a model for the distribution of the range of a sample of independent Gaussian variables with the same mean and standard deviation.
The Recipe for Range Charts
The SPC literature provides a recipe for the range chart but omits the math that justifies it. In your process capability study, you start with a sequence of measurements that is representative of your process in the absence of assignable causes of disruption.
You first split it into a sequence k rational subgroups of size n. That’s kn data points x_{ij} with i=1,\dots, k and j=1,\dots, n. You compute the average and the range of the measurements in each subgroup, i = 1,\dots, k:
\bar{x}_i = \frac{1}{n}\sum_{j=1}^{n}x_{ij}
r_i = \max_{j =1,\dots,n}\left ( x_{ij} \right ) - \min_{j =1,\dots,n}\left (x_{ij} \right )which you summarize as
\bar{\bar{x}} = \frac{1}{k}\sum_{i=1}^{k}\bar{x}_i and
\bar{r} = \frac{1}{k}\sum_{i=1}^{k}r_iThen you look up coefficients in the following table, which vary with the subgroup size n:
Using these coefficients, you set up the \bar{X} chart with the following parameters:
And the R chart:
Then you plot the average and range of data on new samples against these limits, for evidence of special causes of variation.
This is the cookbook version. It provides no rationale for the values of A2, D3, or D4, and none of the assumptions about the x_{ij} that are needed for them to be valid. Let us fill this gap.
The Distribution of Sample Ranges
The x_{ij}, j = 1,\dots, n in any one of the k subgroups are a sequence of measurements, but we will assume that their order makes no difference and that they are instances of a sample \mathbf{X} = \left ( X_1, \dots, X_n \right ) of n independent and identically distributed (i.i.d.) random variables, with a probability distribution function (p.d.f)f and cumulative distribution function (c.d.f.) F. We also assume that f is continuous and that this distribution has a mean \mu and a standard deviation \sigma.
We are going to study the distribution of the random variable
R = \max_{j =1,\dots,n}\left ( X_j \right ) - \min_{j =1,\dots,n}\left (X_j \right )Generic Model
Because f is continuous, we can neglect the probability that two or more of the X_i take the same value, and the probability that one of the n variables is between x and x+dx is
P(x \le X_1< x+dx) + \dots +P(x \le X_n< x+dx) = nf\left (x \right )dx.
Then the probability that the n-1 other X_i are all in [x, x+r] is
U\left ( r , x \right ) = \left [ F\left ( x+r \right ) - F\left ( x \right )\right ]^{n-1}For a sample of n points, this looks as follows:
and the corresponding density is
u\left ( r , x \right ) = \frac{\partial U}{\partial r}\left ( r , x \right ) = \left ( n-1 \right )\left [ F\left ( x+r \right ) - F\left ( x \right )\right ]^{n-2} \times f\left ( x+r \right )The density of the range R is therefore
h\left ( r \right ) =n \int_{-\infty}^{+\infty} u\left ( r, x\right )f\left ( x \right )dxOr, in terms of the distribution of the X_i,
h\left ( r \right ) =n \left ( n-1 \right ) \int_{-\infty}^{+\infty} \left [ F\left ( x+r \right ) - F\left ( x \right )\right ]^{n-2}\times f\left ( x+r \right )f\left ( x \right )dxNo Universal Simplification
The general case doesn’t simplify any further. We get a simple formula for the uniform distribution and approximations for large samples but, otherwise, even for Gaussian variables, we have to use numerical methods.
Noting that nothing is changed in this formula if you change the variable from x to x+\mu, it is clear that the range distribution does not depend on the mean of the X_i. There is no loss of generality is assuming their mean to be zero.
On the other hand, the range distribution is related to the standard deviation \sigma of the X_i and quantifying this relationship at least for Gaussian X_i is our purpose in studying the range distribution.
Sample Range for a Uniform Distribution
In this case, f= \mathbf{I}_{\left [ 0,1 \right ]} is 1 over the interval between 0 and 1, and 0 everywhere else and
F\left ( x \right )=\left\{\begin{array}{l}0 \,\text{if}\, x <0\\x \,\text{if}\, 0 \leq x \leq 1\\1 \,\text{if}\, x > 1\\\end{array}\right.then
f\left ( x+r \right ) = \mathbf{I}_{[0,1]}\left ( x+r \right )where I_{[0,1]} is the indicator function of the interval [0,1]. Since
\mathbf{I}_{[0,1]}\left ( x \right ) \times \mathbf{I}_{[0,1]}\left ( x +r\right ) = \mathbf{I}_{[0,1]\bigcap [-r, 1-r]} (x) = \mathbf{I}_{[0,1-r]}\left ( x \right )the integrand in h\left ( r \right ) is nonzero only for r \in \left [ 0,1 \right ] and x \in \left [ 0,1-r \right ] , and
h\left ( r \right ) = n\left ( n-1 \right )r^{n-2}\int_{0}^{1-r} dx = n\left ( n-1 \right )r^{n-2}\left ( 1-r \right )The cumulative distribution function is
H\left ( r \right ) = nr^{n-1 }- \left ( n-1\right )r^nThe range distribution is tractable for the uniform distribution, but this distribution only occurs with measured process variables in rare cases, for example where the distribution is much wider than the spec and production involves binning.
The case of Gaussians
The measurements on manufacturing workpieces are not uniformly distributed, and their rational subgroups, of 5 to 10 points, are small samples. Consequently, the above calculations don’t apply. The control chart model is that, in a state of statistical control, the measurements are the sum of a constant and Gaussian white noise.
How the Range Distribution Scales
We know that the range distribution is independent of the mean, in general, but we need to explore its relationship with the standard deviation for a sample of centered Gaussians ~N(0, \sigma). For any of the X_i, i = 1,\dots, n, the p.d.f. Is
f\left ( x \right )= \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{1}{2}\left ( \frac{x}{\sigma} \right )^2}which is of the form f\left ( x \right )= \frac{1}{\sigma}\phi\left ( \frac{x}{\sigma} \right ) where \phi\left ( x \right )= \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} is the p.d.f. for N(0,1) and the c.d.f. Is of the form:
F\left ( x \right ) = \Phi(\frac{x}{\sigma})Therefore
h\left ( r \right ) =n \left ( n-1 \right ) \int_{-\infty}^{+\infty} \left [ \Phi\left (\frac{ x+r }{\sigma}\right ) - \Phi\left (\frac{ x }{\sigma}\right )\right ]^{n-2}\times\frac{1}{\sigma^2} \phi\left ( \frac{ x+r }{\sigma}\right )\phi\left ( \frac{ x }{\sigma} \right )dxand, by changing the variable to z= x/\sigma,
h\left ( r \right ) =\frac{n \left ( n-1 \right )}{\sigma} \int_{-\infty}^{+\infty} \left[\Phi\left (z + \frac{ r }{\sigma}\right ) - \Phi\left (z \right )\right ]^{n-2}\times \phi\left ( z+\frac{ r }{\sigma}\right )\phi\left (z \right )dz = \frac{1}{\sigma}h_1\left ( \frac{r}{\sigma} \right )where h_1 is the p.d.f. for the range when \sigma = 1:
h_1\left ( r \right ) =n\left ( n-1 \right )\int_{-\infty}^{+\infty}\left [ \Phi\left ( x+r \right ) - \Phi\left ( x \right )\right ]^{n-2}\phi\left ( x+r \right )\phi\left ( x \right )dxThis means that the distribution of the range scales exactly like the distribution of the X_i, and that we only need to determine the distribution of R for \sigma =1.
For other values of \sigma, the range distribution scales like the Gaussian itself. From the formula, we can see that, for n>2, h_1(0) =0 , but it’s not so for n=2.
In fact, in that case, R =|X_1 -X_2|, which is the absolute value of the difference between two independent N(0,1 variables. The difference X_1 - X_2 is Gaussian N(0, \sqrt{2}, and its absolute value, a “folded Gaussian,” as discussed about the moving range chart in a post about the XmR chart.
Bounds and Special Values
For all n,
h_1\left ( r \right ) \leq n\left ( n-1 \right )\int_{-\infty}^{+\infty}\phi\left ( x + r \right )\phi(x)dx = n\left ( n-1 \right )\frac{1}{2\sqrt{\pi}}e^{-\frac{r^2}{4}}because the integrand is the p.d.f. at r of the sum of two Gaussians N(0,1). This cap tells us how far it makes sense to calculate h_1(r). For example, for n=30 and r = 7, it tells us that h_1(r) \leq 0.004.
For n=3,
\frac{\mathrm{d} h_1}{\mathrm{d} r}\left ( 0 \right ) = \frac{n\left ( n-1 \right )}{2\pi} = 0.955For n>3,
\frac{\mathrm{d} h_1}{\mathrm{d} r} \leq \frac{n\left ( n-1 \right )}{2\pi}\int_{-\infty}^{+\infty}\frac{\partial }{\partial r}\left [ \Phi\left ( x+r \right ) - \Phi\left ( x \right )\right ]^{n-2}dxor
\frac{\mathrm{d} h_1}{\mathrm{d} r} \leq \frac{n\left ( n-1 \right )\left ( n-2 \right )}{2\pi}\int_{-\infty}^{+\infty}\left [ \Phi\left ( x+r \right ) - \Phi\left ( x \right )\right ]^{n-3}\phi\left ( x+r \right ) dxand h_1 starts out flat at 0: h_1\left ( 0 \right ) = 0 and \frac{\mathrm{d} h_1}{\mathrm{d} r}\left ( 0 \right ) = 0
Expected Value of the Range Distribution
Let s = \frac{r}{\sigma}. Then
E\left ( R \right ) = \int_{0}^{\infty} r h\left ( r \right )dr = \int_{0}^{\infty} \frac{r }{\sigma}h_1\left ( \frac{r }{\sigma}\right )dr = \sigma \int_{0}^{\infty} s h_1\left ( s\right )ds= \sigma \times d_2It means that the expected value of the range R is proportional to \sigma and to its value d_2 for \sigma =1, which depends only on the sample size n.
Standard Deviation of the Range Distribution
First, using the same variable change as above,
E\left ( R^2 \right ) = \int_{0}^{\infty}r^2\times\frac{1}{\sigma}h_1\left ( \frac{r}{\sigma} \right )dr = \sigma^2\int_{0}^{\infty}s^2h_1\left (s \right )dsThen
\sigma_R^2 = Var\left ( R \right ) = \sigma^2\int_{0}^{\infty}s^2h_1\left (s \right )ds -\left [ E\left ( R \right ) \right ]^2 = \sigma^2\left [\int_{0}^{\infty}s^2h_1\left (s \right )ds - d_2^2\right ]and the standard deviation is
\sigma_R = \sigma\sqrt{\int_{0}^{\infty}s^2h_1\left (s \right )ds - d_2^2} = \sigma \times d_3 = \frac{d_3}{d_2}E\left ( R \right )The standard deviation \sigma_R of R is proportional to \sigma and to its value d_3 for \sigma =1
Numerical Approximations
The plots and tables below use the following R tools:
Range Distributions
The plots of densities and cumulative distributions of the range for samples of independent Gaussian variables of increasing sizes are as follows:
These densities are definitely not Gaussian for n=2 and n=3 and still heavily skewed for n=5. This matters because range charts are usually drawn for small samples.
As n rises, the distribution looks more and more bell-shaped and less skewed, which may tempt us, for large n, to use the Gaussian distribution as a model, with mean E\left ( R \right ) and standard deviation \sigma_R. For \sigma =1 and n=30,
E\left ( R \right ) = d_2 = 4.0830and
\sigma_R = \frac{d_3}{d_2}E\left ( R \right )= 0.6969Let’s plot both the range distribution for n=30 and the Gaussian N\left (4.0830, 0.6969\right ):
Overall, the two distributions look close. However, it is the tails that matter most in the setting of control limits. In this case, the upper tail matters most, because it’s used to generate alarms on increases in the range from assignable causes. As we’ll see below, the SPC literature sets control limits for range charts based on this Gaussian approximation for all n. As a the result, that the \pm 3\sigma limits do not encompass 99.73% of the distribution when under statistical control.
Range-based Control Limits for the \bar{X} chart
For a sample size of n, the control charts on \bar{X} are set at \mu \pm 3\frac{\sigma}{\sqrt{n}}
Since \sigma = \frac{E\left ( R \right )}{d_2}, in terms of E\left ( R \right ), these limits are at \mu \pm \frac{3}{\sqrt{n}d_2} E\left ( R \right )
Therefore \mathbf{A2} = \frac{3}{\sqrt{n}d_2}
is as in the following table:
This justifies the use of the average range \bar{r} to set control limits on the \bar{X} chart. For the range chart, we need to start with the method used for setting limits on the \sigma chart of within-sample standard deviations.
Control Limits for the Range Chart in the SPC literature
The SPC literature sets the control limits for the range chart at E\left ( R \right ) \pm 3\sigma_R, which works out to
UCL = \left ( 1 + 3\frac{d_3}{d_2} \right )E\left ( R \right ) = D4\times E\left ( R \right )
LCL = max\left [0, \left ( 1 - 3\frac{d_3}{d_2} \right ) \right ]E\left ( R \right ) = D3\times E\left ( R \right )Critique of the Range Chart Control Limits
Why use \pm 3\sigma_R limits on the range chart? For the \bar{X} chart. they apply to the extent the sample averages are Gaussian. Then, the values will fall between the limits with a probability of 99.73%, unless an assignable cause has shifted the mean.
Meaning of the Threshold
It is an arbitrary threshold that Walter Shewhart chose in 1924 because it was “about the magnitude customarily used in engineering practice.” (Statistical Method from the Viewpoint of Quality Control, p.62). Even though Shewhart’s first book, in 1931, was called “Economic Control of Quality,” it contains no reference to a loss function to be minimized or an expected utility to be maximized. It’s understandable, given that Von Neumann & Morgenstern first introduced these concepts in statistical decision theory in 1947. The question here, however, is not whether setting limits to encompass 99.73% of the distribution is, in some sense, economic, but whether the limits in the recipe actually do it.
Negative Lower Limits for Ranges?
Ranges, by definition, are always positive, but there is nothing to prevent the -3\sigma limit from being negative, and we have to replace these values with 0 for the limits to make sense. Logically, this isn’t satisfying.
Asymmetry of Low and High Ranges
With \bar{X}, excursions on either side of the \left [ LCL, UCL \right ] are alarms about the process being out of control. In the R chart, there is no such symmetry. Going above the UCL is indeed an alarm, but going below the LCL, if validated, means that the variability of the process is reduced, which is a cause for celebration, not alarm.
p-Values of Control Limits for Ranges
In the words of today’s statisticians, it means that the test of whether an instance of \bar{X} is between the control limits has a p-value of .0027. Let’s zero in on \sigma =1. Then UCL = D4\times d2 and LCL = D3\times d2 . The p-values are therefore: p = 1-\left [ H_1\left ( UCL \right ) - H_1\left (LCL \right ) \right ], which we can calculate for various values of n:
Alternative Approach to Range Control Limits
Since we know the c.d.f. H_1 numerically, we can use it to set limits so that, under statistical control, the range lies between the limits 99.73%, as for the other kinds of charts. For both tails to be equal in probability at 0.135% and \sigma =1, we can set the limits at UCL = H_1^{-1} \left (.99865\right) and LCL =H_1^{-1} \left (.00135\right)
The following picture illustrates this process for n=5:
Estimating the Standard Deviation
The literature commonly fails to distinguish the standard deviation \sigma . The first is a parameter of a random variable X. The second is its estimate s from a sample x_i, i = 1,\dots, n:
\sigma = \sqrt{E \left [ X-E\left ( X \right ) \right ]^2} and
s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}E \left [ x_i- \bar{x} \right ]^2}A random variable does not always have a standard deviation. When it does, s^2 is an unbiased estimator of \sigma^2. It does not mean that s is an unbiased estimator of \sigma. In fact, Shewhard introduced a factor now called c_4 to correct this bias for \bar{X}-s charts.
But, for Gaussian variables,
\sigma= \frac{1}{d_2}\times E \left ( R \right ) = \frac{1}{d_2}\times E \left ( \bar{r} \right )which means that you can use the average range to get an unbiased estimate of \sigma.
Conclusions
We now know who developed the range chart. We also know what problem they were trying to solve, and the underlying theory behind their solution. The next question is its relevance 100 years later. In 2025, we can admire the ingenuity with which Shewhart, Dodge, and others worked around the limitations of their environment. But we are not facing the same technical or human constraints.
References
#controlchart, #rangechart, #xbarrchart, #spc, #processcontrol, #quality
Share this:
Like this:
Related
By Michel Baudin • Quality 0 • Tags: Control Charts, Quality, Range Chart, SPC, Xbar-R chart