1. Introduction
In a nonterminating simulation, we are often interested in long-run (steady-state) average performance measures. Let {Xi: i=1, 2, ...} denote a stochastic process representing the sequence of outputs generated by a single run of a nonterminating probabilistic simulation. If the simulation is in steady-state operation, then the random variables {Xi} will have the same steady-state cumulative distribution function (c.d.f.) FX(x)=Pr{Xi
x} for i=1, 2, ..., and for all real x.
Usually in a nonterminating simulation, we are interested in constructing point and confidence-interval (CI) estimators for some parameter of the steady-state c.d.f. FX(
). In this work, we are primarily interested in estimating the steady-state mean,
; and we limit the discussion to output processes for which E[Xi2]<
so that the process mean
X and process variance
X2=Var[Xi]=E[(Xi-
X)2] are well defined. We let n denote the length of the time series {Xi} of outputs generated by a single, long run of the simulation.
In this article we discuss SBatch, a simplified procedure for steady-state simulation analysis that is based on the method of spaced batch means (Fox et al, 1991), incorporating many advantages of its predecessors ASAP3 (Steiger et al, 2004, 2005) and WASSP (Lada et al, 2003; Lada and Wilson, 2006) while avoiding many of their disadvantages. Based on our experimentation with a diversity of test processes, we found that SBatch had roughly the same sampling efficiency as ASAP3 in some cases and the same ability to eliminate initialization bias as WASSP while being much simpler to implement and understand than either ASAP3 or WASSP.
The rest of this article is organized as follows. In Section 2 we review the simulation analysis method of spaced batch means, and in Section 3 we provide an overview of SBatch. Section 4 contains the detailed operational steps of SBatch. In Section 5 we summarize some of the results of our experimental performance evaluation; and in Section 6 we present our main conclusions. Lada and Wilson (2007) present an abridged version of some of the results that are fully detailed in this article.
2. The method of spaced batch means
In the conventional method of spaced batch means (Fox et al, 1991), the output sequence {Xi: i=1, 2, ..., n} is divided into k batches, each of size m with a spacer of size s preceding each batch, where both s and m are sufficiently large to ensure that the resulting spaced batch means are independent and identically distributed (i.i.d.) normal random variables. For the jth spaced batch, the sample mean is

and the average of all observations beyond the first spacer,

is the point estimator for
X based on the assumption that the first spacer {Xi: i=1, 2, ..., s} contains the entire warm-up period (if there is one). From (1) and (2), we can construct a CI estimator for
X using the (classical) approach detailed in Fox et al (1991).
In this article, we develop a variant of the method of spaced batch means to handle the situation that the batch size m and the spacer size s are sufficiently large so that the resulting spaced batch means approximately constitute a stationary Gaussian process with mean
X but are not necessarily independent. In our experience such a situation is much easier to achieve in practice than other conditions that are required to apply other batch-means procedures. We compute the sample variance of the k spaced batch means for batches of size m,

where

is the grand mean of the spaced batch means. For the user-specified confidence level 1-
(where 0<
<1), we then compute an approximately valid 100(1–
)% correlation-adjusted CI for
X as follows:

where t1–
/2,k-1 is the 1-
/2 quantile of Student's t-distribution with k-1 degrees of freedom; and the correlation adjustment A is applied to the sample variance (3) to compensate for any residual correlation between the spaced batch means. The correlation adjustment A is computed as

where the standard estimator of the lag-one correlation of the spaced batch means is

3. Overview of SBatch
Figure 1 depicts a high-level flow chart of SBatch. The following user-supplied inputs are required:
- a simulation-generated output process from which the steady-state mean response is to be estimated;
- the desired CI coverage probability 1-
; and - an absolute or relative precision requirement specifying the final CI half-length in terms of a maximum acceptable half-length h* (for an absolute precision requirement) or a maximum acceptable fraction r*of the magnitude of the CI midpoint (for a relative precision requirement).
SBatch returns the following outputs:
- a nominal 100(1–
)% CI for the steady-state mean that satisfies the specified precision requirement, provided no additional data are required; or - a new, larger sample size to be supplied to the algorithm.
SBatch begins by dividing the initial simulation-generated output process of length n=16 384 observations into k=1024 adjacent batches of size m=16, with a spacer of initial size s=0 observations preceding each batch. The randomness test of von Neumann (1941) is then applied to the initial set of batch means calculated for each batch. One of the purposes of the randomness test is to determine an appropriate data-truncation point beyond which all computed batch means are approximately independent of the simulation model's initial conditions. A second purpose of the randomness test is to construct a set of spaced batch means such that the interbatch spacer preceding each batch is sufficiently large to ensure all computed batch means are approximately independent and identically distributed, so that subsequently the spaced batch means can be tested for normality.
Each time the randomness test is failed, an additional batch is added to each spacer (up to a limit of 14 batches), and the randomness test is reperformed on the new reduced set of spaced batch means. If the randomness test is failed with a spacer consisting of 14 batches so that only 68 spaced batch means are used in the test, then the spacer size s is reset to zero and both the batch size m and the total sample size n are increased by the factor
; the required additional observations are obtained (by restarting the simulation if necessary); the augmented sample is rebatched into k=1024 nonspaced batches of the new batch size m; and a new set of k batch means is computed and tested for randomness with a spacer of size s=0.
Once the randomness test is passed, the size of the spacer separating each batch is fixed; the observations {Xi: i=1, ..., s} composing the first spacer are discarded to eliminate any warm-up effects; and the resulting set of spaced batch means is tested for normality using the method of Shapiro and Wilk (1965). Each time the normality test is failed, the batch size m is increased by the factor
until the sixth consecutive failure. For each subsequent failure of the normality test (starting with iteration q=7), the batch size m is increased by the factor 21/(q-4) for q=7, 8,... . A new set of spaced batch means is then computed using the final spacer size s determined in the randomness test, and the normality test is repeated for the new set of spaced batch means.
Once the normality test is passed, we test the condition that 0.8 is an upper limit for the lag-one correlation of the resulting set of approximately normal, spaced batch means. Each time the correlation test is failed, the batch size m is increased by 10%, the required additional observations are obtained (by restarting the simulation if necessary), a new set of spaced batch means is computed, and the correlation test is repeated for the new set of spaced batch means.
After the correlation test is passed, a correlation-adjusted CI of form (4) is computed, where the midpoint of the CI is
, the grand average of all the observations beyond the first spacer; and the CI half-length is
, which differs from the usual batch-means formulation by incorporating the correlation adjustment A as well as the sample variance of the batch means. The correlation adjustment A compensates for any residual correlation that may exist between the spaced batch means so that the final result is an approximately valid CI for the steady-state mean.
The CI (4) is then tested to determine whether it satisfies a user-specified absolute or relative precision requirement. If the precision requirement is satisfied, then SBatch delivers the latest CI and terminates. Otherwise, SBatch estimates the number of spaced batch means required to satisfy the precision requirement. If the estimated spaced batch count exceeds 1024, then the batch count is set to 1024 and a new batch size is estimated so that the total delivered sample size is the smallest possible value not less than the total required sample size. If necessary, additional observations are collected and a new set of spaced batch means is computed. The CI (4) is recomputed and the precision requirement is retested.
4. Detailed operational steps of SBatch
4.1. Formal algorithmic statement of SBatch
A formal algorithmic statement of SBatch is given in Figure 2. In Sections 4.2–4.5 we describe the steps of the procedure in more detail.
4.2. Test for randomness
SBatch begins by dividing the initial sample {Xi: i=1, ..., n} of length n=16 384 observations into k=1024 adjacent (nonspaced) batches of size m=16. Batch means are computed according to (1) using initial spacers of size s=0 for each of the k batches. We then apply the randomness test of von Neumann (1941) to the resulting batch means {
1(m, s), ...,
k(m, s)} by computing the ratio of the mean square successive difference of the batch means to the sample variance of the batch means. In SBatch we apply the von Neumann test for randomness iteratively to determine the size of an interbatch spacer that is sufficiently large to yield approximate independence of the corresponding spaced batch means. At the level of significance
ran=0.2, we test the null hypothesis of independent, identically distributed spaced batch means,

by computing the test statistic,

which is a relocated and rescaled version of the ratio of the mean square successive difference of the batch means to the sample variance of the batch means. Since SBatch's test for randomness always involves at least 68 batch means, we use a normal approximation to the null distribution of the test statistic (13) (see Fishman and Yarberry, 1997, p 303). Let z
denote the
quantile of the standard normal distribution for 0<
<1. If

then the hypothesis (12) is accepted; otherwise (12) is rejected so that SBatch must increase the spacer size before retesting (12). As documented in Section 4.2 of Lada and Wilson (2006), we found that setting
ran=0.2 works well in practice and provides an effective balance between errors of type I and II in testing hypothesis (12).
If the k=1024 batch means {
1(m, s),
2(m, s), ...,
k(m, s)} with spacer size s=0 pass the randomness test (12), (13) and (14) at the level of significance
ran, then we set the number of batch means to be used in the normality test according to k'
1024; and we proceed to perform the normality test as detailed in Section 4.3 below. On the other hand, if the k=1024 batch means with spacer size s=0 fail the test for randomness, then we insert spacers each consisting of one ignored batch between the k'
512 remaining batches whose corresponding spaced batch means are to be retested for randomness; and thus the updated spacer size is s
m observations. That is, every other batch, beginning with the second batch, is retained for assignment as one of the spaced batch means; and the alternate batches are ignored.
We reapply the randomness test to the k'=512 remaining spaced batch means {
1(m, s),
2(m, s), ...,
512(m, s)} having batch size m and spacers of size s=m. If the randomness retest is passed, then we proceed to perform the normality test in Section 4.3 with the current values of s and k'; otherwise we add another ignored batch to each spacer so that the spacer size and number of remaining batches are updated according to

After executing (15), we have k'=341 remaining spaced batch means {
1(m, s),
2(m, s), ...,
341(m, s)} with batch size m and spacers of size s=2m; and again the remaining spaced batch means are retested for randomness. This process is continued until one of the following conditions occurs: (i) the randomness test is passed; or (ii) the randomness test is failed and in the update step (15), the batch count k' drops below the lower limit of 68 batches. If condition (i) occurs, then we proceed to the normality test in Section 4.3 with the current values of s and k'. On the other hand, if condition (ii) occurs, then the batch size m, the batch count k, the overall sample size n, and the spacer size s are updated according to

respectively; the required additional observations are obtained (by restarting the simulation if necessary) to complete the overall sample {Xi: i=1, ..., n}; and then k adjacent (nonspaced) batch means are computed from the overall sample according to (1).
At this point we reperform the entire randomness-testing procedure, starting with the current set of k=1024 adjacent (nonspaced) batch means of the current batch size m with spacer size s=0. If the randomness test is passed, then we set k'
k and proceed to the normality test in Section 4.3; otherwise we repeat the steps outlined in the two immediately preceding paragraphs. Once the randomness test is passed, we finalize the spacer size s so that we have a set of k' approximately i.i.d. spaced batch means, where 68
k'
1024.
This approach to handling the simulation start-up problem is very similar to the approach used in WASSP (Lada et al, 2003, 2007; Lada and Wilson, 2006). Through extensive experimentation with the WASSP algorithm as well as with SBatch, we found this approach to be effective in determining an appropriate spacer size s so that:
- The observations {X1, X2, ..., Xs} constituting the first spacer can be regarded as containing the warm-up period because the spaced batch means beyond the first spacer do not exhibit significant departures from randomness—that is, they do not exhibit a deterministic trend or any type of stochastic dependence on the simulation's initial conditions.
- The spaced batch means computed beyond the first spacer are approximately i.i.d. and thus can be meaningfully tested for normality using standard goodness-of-fit tests.
There are a few key differences between the methods used in WASSP and SBatch for eliminating initialization bias, however. WASSP requires an initial sample of size n=4096 and allows a maximum of nine batches per spacer, resulting in a final spaced batch count in the range 25
k'
256. By contrast, SBatch requires an initial sample of size n=16 384 and allows up to 14 batches per spacer, resulting in a final spaced batch count in the range 68
k'
1024. By increasing the initial sample size (and consequently the minimum number of spaced batch means k'), we have increased the sensitivity of both the randomness test and the subsequent test for normality that are applied to the resulting set of spaced batch means.
Increasing the sensitivity of the normality test is critical to the effectiveness of SBatch because the delivered CI (9) is derived from classical results based on Student's t-distribution that require the basic observations to be a stationary Gaussian process. The WASSP algorithm, on the other hand, is less sensitive to departures from normality in the original (unbatched) process {Xi} because (i) in general WASSP requires substantially larger final sample sizes, thus ensuring a central-limit effect in its final point estimator of
X; and (ii) WASSP's estimator of the batch-means log-spectrum at zero frequency appears to converge in distribution rapidly to a scaled chi-squared distribution with 6 degrees of freedom in most applications.
4.3. Testing spaced batch means for normality
To test the spaced batch means for normality, we use the Shapiro–Wilk test (Shapiro and Wilk, 1965) because it is a well-established, powerful, and widely used test for departures from normality in a data set consisting of i.i.d. observations (Royston, 1982, 1993, 1995). Thus in SBatch we apply the Shapiro–Wilk test to the k' spaced batch means with the final spacer size s determined in the preceding test for randomness. To assess the normality of the sample

we start by sorting the observations in ascending order to obtain the order statistics
(1)(m, s)
(2)(m, s)


(k')(m, s). The Shapiro–Wilk test statistic is then computed as follows:

where the coefficients {
k'-ℓ+1:
=1, ...,
k'/2
} are evaluated using the algorithm of Royston (1982). The test statistic W is then compared to the
nor quantile w
nor of the distribution of W under the null hypothesis of i.i.d. normal batch means,

If W
w
nor, then at the level of significance
nor we reject hypothesis (17).
For the first iteration of the normality test, the iteration counter is set to q
1 and the level of significance for the Shapiro–Wilk test is
nor(1)
0.05. In general, if on the qth iteration of the normality test (16), (17) the hypothesis (17) is accepted at the level of significance (7), then we proceed to the test for correlation as outlined in Section 4.4; otherwise, we repeat the following steps until the batch means finally pass the normality test (16), (17):
- The iteration counter q, the batch size m, and overall sample size are increased according to (8); and the required additional observations are obtained (by restarting the simulation if necessary) to complete the overall sample {Xi: i=1, ..., n}.
- The overall data set {X1, ..., Xn} is reorganized into k' spaced batches of size m so that successive batches are separated from each other by spacers each consisting of s observations.
- The spaced batch means {
j(m, s): j=1, ..., k'} are recomputed according to (1). - The level of significance
nor(q) for the current iteration q of the Shapiro–Wilk test is reset according to (7). - The k' spaced batch means {
j(m, s): j=1, ..., k'} are retested for normality at the level of significance
nor(q).
We found through extensive experimentation with WASSP that the scheme in Equation (7) for decreasing the significance level
nor(q) by at least an order of magnitude on each iteration q of the normality test for which q
7 works well for a wide variety of simulation-generated output processes (see Section 4.3 of Lada and Wilson, 2006). However, extensive preliminary experimentation with SBatch indicated that simply decreasing the significance level on each iteration of the normality test was not sufficient to control excessive growth of the batch sizes required to achieve approximate normality of the spaced batch means. As discussed in the previous section, SBatch starts with a larger batch count to increase the sensitivity of the normality test by enforcing a minimum of 68 spaced batch means for the normality test as opposed to 25 spaced batch means for WASSP. However, increasing the sensitivity of the normality test also increases the variability of the final sample size in applications involving highly non-normal data. Consequently, instead of using the WASSP approach of increasing the batch size by the factor
each time the normality test is failed, SBatch increases the batch size by the factor
for the first six consecutive failures of the normality test; and for each failure after that, the batch size is increased according to

This modification to the batch-size inflation factor balances the need for avoiding gross departures from normality of the batch means and avoiding excessive growth in the batch sizes necessary to ensure approximate normality of the batch means.
4.4. Test for acceptable correlation
When estimating CIs based on spaced batch means, we have found that we must avoid situations in which 
(m, s), the lag-one correlation of the spaced batch means, is so close to one as to induce excessive variability in the correlation-adjusted CI (9). Furthermore, in our prior research on ASAP and ASAP2, we found that if the nonspaced batch means {
j(m)} obtained by taking s=0 in (1) have lag-one correlation 
(m) that substantially exceeds 0.8, then in the no precision case CIs delivered by ASAP or ASAP2 can be extremely unstable, with excessive values for the mean, variance, or coefficient of variation of the CI half-lengths. An explanation for this phenomenon is detailed in Appendix A of Steiger et al (2005). To avoid this phenomenon, the ASAP3 algorithm was designed to include a method that uses the arc sine transformation of 
(m), the standard estimator of 
(m), to check the condition that a 97.5% upper confidence limit for sin-1[
(m)] does not exceed the threshold sin-1(0.8).
Based on all the foregoing considerations, the SBatch algorithm applies a similar correlation test to the approximately normal, spaced batch means. In particular, we have found that SBatch delivers reasonably stable, well-behaved CIs if the batch size m is taken sufficiently large to ensure that 
To test hypothesis (18) at the significance level
corr=0.025, we seek a one-sided upper CI for 
(m, s) that is based on 
(m, s) as defined in (6) and that, with probability no less than 1-
corr=0.975, falls at or below the limit 0.8 when (18) holds. Since k'
68 on every iteration of SBatch, we use the approximation

(Steiger et al, 2005, p 52) to test hypothesis (18) at the significance level
corr=0.025 by checking for the condition that a 100(1-
corr)% one-sided upper confidence limit for sin-1[
(m, s)] does not exceed the threshold sin-1(0.8)=0.927. If on a particular iteration of the correlation test within SBatch we find

where z1-
corr=z0.975=1.96, then we conclude that the current batch size m satisfies (18); and we proceed to the construction of a correlation-adjusted CI for
X in the next step of SBatch.
If condition (19) is not satisfied, then we increase the batch size m and the overall sample size n according to

Additional simulation-generated observations are collected if necessary and a new set of spaced batch means is computed according to (1) using the new batch size m and the final spacer size s determined in the randomness test. Then hypothesis (18) is retested.
4.5. Computing the correlation-adjusted CI
At this point, we have a time series of k' spaced batch means that is approximately a stationary Gaussian process. This time series can be used to construct a CI for
X that has been adjusted to compensate for the remaining (nonexcessive) correlations between the k' spaced batch means for batches of the current size m. The correlation adjustment A has form (5) (where in the computing formulas (3) and (6) we take k=k') so that a correlation-adjusted 100(1-
)% CI for the steady-state mean is then given by (9). The justification for (9) is given in the Appendix.
4.6. Fulfilling the precision requirement
Let H denote the half-length of the CI (9). If (9) satisfies the precision requirement (10), where H* is given by (11), then SBatch terminates, delivering the CI (9). If the precision requirement (10) is not satisfied, then we estimate k*, the total number of spaced batches of the current batch size m that are needed to satisfy (10),

Thus k*(s+m) is our latest estimate of the total sample size needed to satisfy the precision requirement; and an upper bound of 1024 is imposed on the actual spaced batch count k' for the next iteration of SBatch.
The new batch size m for the next iteration of SBatch is assigned according to

so that the total sample size n is increased approximately by the factor (H/H*)2. On the next iteration of SBatch, the total sample size, including the warm-up period, is thus given by

where s was finalized in the randomness test. The additional observations are obtained by restarting the simulation or by retrieving extra data from storage; then the next iteration of SBatch is performed by computing a new CI for
X using (9).
5. Experimental results
To evaluate the performance of SBatch with respect to the coverage probability of its CIs, the mean and variance of the half-length of its CIs, and the associated sample sizes, we applied SBatch together with ASAP3 and WASSP to a suite of carefully selected test problems. The experimental design includes some problems typically used to 'stress-test' simulation analysis procedures and some problems more closely resembling real-world applications. The steady-state mean response is available analytically for these test problems; thus we were able to evaluate the performance of SBatch, ASAP3, and WASSP in terms of actual versus nominal coverage probabilities for the CIs delivered by each of these procedures.
Our experiments included 1000 independent replications of SBatch and WASSP and 400 independent replications of ASAP3 to construct nominal 90 and 95% CIs that satisfied four different precision requirements. For the case of no precision requirement, we took H*=
in (10) so that SBatch delivers the CI (9) using the batch count and batch size required to pass the randomness, normality, and correlation tests. For the cases of the precision requirements
15,
7.5, and
3.75%, we continued the simulation of each test problem until SBatch delivered a CI of the form (9) that satisfied the stopping criterion in (10) and (11) with r*=0.15, 0.075, and 0.0375, respectively.
For each CI that was replicated 400 (respectively, 1000) times, the standard error of the coverage estimator for CIs with nominal 90% coverage probability is approximately 1.5% (respectively, 0.95%); and for CIs with nominal 95% coverage probability, the standard error of the coverage estimator is approximately 1.1% (respectively, 0.69%). As explained below, these levels of precision in the estimation of coverage probabilities turned out to be sufficient to draw meaningful conclusions about the performance of SBatch compared with that of ASAP3 and WASSP.
5.1. M/M/1 queue waiting time process
For the first test problem, we let Xu denote the waiting time in the queue for the uth customer (u=1, 2, ...) to arrive after time 0 at a single-server queueing system that has the following operational characteristics: i.i.d. exponential interarrival times with mean 10/9 (so that the arrival rate
=0.9); i.i.d. exponential service times with mean 1 (so that the service rate
=1); and steady-state server utilization
=
/
=0.9. We used two different initial conditions:
- In the first set of experiments, we used the empty-and-idle initial condition (so that X1=0) primarily because this is the standard initial condition used to evaluate the performance of virtually all steady-state simulation analysis procedures.
- In the second set of experiments, we deliberately induced a large initial transient in the series of simulation-generated waiting times by loading the system with 113 initial customers at time zero, with the interarrival time to the first regular customer after time 0 being sampled from the exponential distribution having mean 10/9 as usual.
The collection of queue-waiting-time statistics began with the first regular customer arriving after time 0. In steady-state operation, the mean and standard deviation of a customer's waiting time in the queue are
X=9.0 and
X=11.6534, respectively. On the other hand, if N(t) denotes the number of customers in the system at time t, then it can be shown that given the initial number of customers N(0)=c, the conditional moment generating function of the queue waiting time for the first regular customer is

where 

/(
+
). From (20) it follows that

where the expression for (d2/d
2)MX1 (
) in (21) is too complicated to display in (21) but is easily evaluated using Maple (Maplesoft, 2003). Thus in the first and second set of experiments on the M/M/1 queue, we have

respectively. The properties displayed in (22) constitute good evidence that the queue-waiting-time process was contaminated by severe initialization bias in the second set of experiments.
Beyond the difficulties caused by initialization bias, the M/M/1 queue waiting time process is a particularly difficult test problem even in steady-state operation for the following reasons: (i) the autocorrelation function (ACF) of the waiting time process decays very slowly with increasing lags; and (ii) the marginal distribution of waiting times has an exponential tail and is therefore markedly non-normal. Thus we can expect slow convergence to the classical requirement that the batch means are i.i.d. normal. This test problem clearly reveals one of the principal advantages that SBatch shares with ASAP3—namely, that SBatch does not require the final batch means to be i.i.d. normal or even approximately so.
Table 1 summarizes the experimental performance of the procedures SBatch, ASAP3, and WASSP when they were applied to the waiting times in the M/M/1 queue with server utilization 0.9 and empty-and-idle initial condition. As an example of a test problem that is more nearly typical of practical applications, Table 2 summarizes the results for the M/M/1 queueing system with the empty-and-idle initial condition, a mean interarrival time of 1.0, and a mean service time of 0.8. Thus in steady-state operation, this system has a server utilization of
=0.8 and a mean waiting time in the queue of
X=3.2. As can be seen from these tables, all three procedures performed reasonably well. From the results in Tables 1 and 2, we concluded that as the precision level r* became progressively smaller, both SBatch and ASAP3 delivered CIs whose coverage probabilities converged to their nominal levels, while WASSP delivered CIs with some overcoverage; moreover, in this situation WASSP appeared to require substantially larger sample sizes than were required by SBatch or ASAP3.
Table 1 - Performance of SBatch, WASSP, and ASAP3 in the M/M/1 queue waiting time process with 90% server utilization and empty-and-idle initial condition.
Table 2 - Performance of SBatch, WASSP, and ASAP3 in the M/M/1 queue waiting time process with 80% server utilization and empty-and-idle initial condition.
To illustrate the robustness of SBatch against severe initialization bias, Table 3 summarizes the experimental performance of SBatch and ASAP3 when both procedures were applied to queue waiting times in the M/M/1 queue with server utilization 0.9 and 113 customers in the system at time 0. In the cases of no precision and
15% precision, both SBatch and ASAP3 required sample sizes roughly twice as large as those required in the comparable experiment involving the empty-and-idle initial condition. In the cases of
7.5 and
3.75% precision, SBatch required sample sizes that were 15-40% larger than those required for the comparable experiment with an empty-and-idle initial condition. On the other hand, in these cases, ASAP3 required nearly the same average sample sizes for both initial conditions. With respect to closeness of conformance to the user-specified requirements on CI precision and coverage probability, both SBatch and ASAP3 performed well, delivering CIs with nearly the specified levels of relative precision and coverage.
Table 3 - Performance of SBatch and ASAP3 in the M/M/1 queue waiting time process with 90% server utilization and 113 customers initially in the system.
5.2. First-order autoregressive (AR(1)) process
If

is a white noise process, then a first-order autoregressive (AR(1)) process {Xu: u=1, 2, ...} with the starting value X0 can be generated as follows:

where
X is the mean and
is the lag-one correlation of the process in steady-state operation. We set the mean
X=100, the autoregressive parameter
=0.995, and the white noise variance 
2=1; and we set X0=0 to obtain the analogue for the AR(1) process of an 'empty-and-idle' initial condition for the M/M/1 queue. The most difficult aspects of this test process are its exceptionally long initial transient period and its persistent autocorrelation structure. On the other hand, the batch means computed from this process are always multivariate normal.
Table 4 shows a comparison of the performance of SBatch, WASSP, and ASAP3 for the given AR(1) process. In many cases, the actual precision of the CIs delivered was significantly smaller than the requested level. For example, in the case of nominal 90% CIs with no precision requirement, SBatch delivered CIs that had average relative precision of 2.15%. In fact, for the no precision case all three analysis procedures delivered average levels of relative precision substantially below 7.5%; and thus the results for the
15 and
7.5% cases were essentially the same as for the no precision case. Therefore, to show a meaningful side-by-side comparison of the performance of SBatch with that of ASAP3 and WASSP in this test process, we chose to display in Table 4 the results for the following levels of precision: no precision,
3.75,
1.875, and
0.9375%.
From the results in Table 4, we concluded that SBatch outperformed both ASAP3 and WASSP for this test problem. At all four precision levels, SBatch required smaller sample sizes than ASAP3 required; and as the precision level became progressively smaller, SBatch also required smaller sample sizes than WASSP required. Furthermore at all four levels of precision, the CIs delivered by SBatch had empirical coverage probabilities close to their nominal levels while both ASAP3 and WASSP delivered CIs that exhibited significant overcoverage, especially at the
1.875 and
0.9375% precision levels.
To further validate the test for correlation in the SBatch algorithm as described in Section 4.4, we also examined the performance of SBatch for the AR(1) process (23) with autoregressive parameter
=-0.995 so that the ACF alternated in sign and converged slowly to zero. For 1000 replications of nominal 90% CIs with no precision requirement, the CI coverage was 93.5%, the average sample size was 17 673, the average CI half-length was 0.0096, and the variance of the CI half-length was 0.00001. Because the delivered average level of relative precision for this case was less than 0.9375%, the results for the
3.75,
1.875, and
0.9375% precision levels were nearly the same as for the no precision case. From these results, we concluded that the test for correlation provided SBatch with adequate protection against extreme autocorrelation structures that could cause degradation in the performance of the procedure.
5.3. M/H2/1 queue waiting time process
As another example of a test problem that is typical of practical applications, Table 5 summarizes our results for the queue waiting time process for the M/H2/1 queueing system. This system has an empty-and-idle initial condition, a mean interarrival time of 1.0, and a hyperexponential service time distribution that is a mixture of two exponential distributions such that the service times have a mean of 0.8 and a coefficient of variation of 2.0. Thus in steady-state operation this system has a server utilization of
=0.8 and a mean queue waiting time of
X=8.0. From the results in Table 5, we concluded that all three output analysis methods performed well for this test problem. In particular, SBatch and ASAP3 exhibited similar CI coverages in the cases for which a precision requirement was specified.
Table 5 - Performance of SBatch, WASSP, and ASAP3 in the M/H2/1 queue waiting time process.
5.4. AR(1)-to-Pareto (ARTOP) process
The 'AR(1)-to-Pareto,' or ARTOP process, is defined as follows. Let {Zu: u=1, 2, ...} be a stationary AR(1) process with N(0, 1) marginals and lag-one correlation
, which can be generated by the relation Zu=
Zu-1+
u, where Z0
N(0, 1) and
is a white noise process with variance 
2=1-
2. If {Xu: u=1, 2, ...} is an ARTOP process with marginal c.d.f.

where
>0 is a location parameter and
>0 is a shape parameter, then we generate {Xu} from the 'base process' {Zu} as follows. For all real z, let
denote the c.d.f. of the N(0, 1) distribution. Taking
for u=1, 2, ..., we obtain a sequence of correlated random numbers {Ru: u=1, 2, ...}; and then applying the inverse of the Pareto c.d.f. (24), we finally obtain {Xu: u=1, 2, ...} so that

The mean and the variance of the ARTOP process (25) are given by
X=E[Xu]=
(
-1)-1 (for
>1) and
X2=
2
(
-1)-2(
-2)-1 (for
>2), respectively (Lada et al, 2007).
We set the parameters of the Pareto distribution (24) according to
=2.1 and
=1; and we set the lag-one correlation in the base process {Zu} to
=0.995. This yields a test process whose marginal distribution has mean, variance, skewness, and kurtosis, respectively, given by
X=1.91,
X2=17.4, E{[(Xu-
X)/
X]3}=
, and E{[(Xu-
X)/
X]4}=
. The most difficult aspects of this test process are its highly non-normal marginals and persistent autocorrelation structure. We took Z0
N(0, 1) so that the process started in steady-state operation and therefore had no warm-up period.
The experimental results for this ARTOP process are summarized in Table 6. For the cases in which there was no precision requirement specified and for the
15% precision case, SBatch and ASAP3 had comparable results in terms of CI coverage while WASSP had significant undercoverage. Also, SBatch required smaller sample sizes than ASAP3 on average for the no precision case and the
15% precision level. For the
7.5% precision case, the three procedures had similar CI coverages, all slightly below the nominal level.
In an attempt to gather some meaningful evidence about the asymptotic performance of SBatch as the precision requirement approaches zero, we performed 1000 replications of SBatch and 400 replications of ASAP3 at the
3.75% precision level. For nominal 90 and 95% CIs delivered by SBatch, the CI coverages were 86.3 and 93.7%, respectively. For ASAP3, the corresponding CI coverages were 88.8 and 91.0%, respectively. On the other hand, the average sample sizes required by SBatch were about 86% larger than those required by ASAP3. Although the coverage probabilities of the CIs delivered by SBatch and ASAP3 appeared to approach their nominal levels as the precision requirement tended to zero, the rate of convergence seemed to be slower for the ARTOP process than for many of the other test problems.
It is important to note that only 400 replications of WASSP and ASAP3 were performed for this test case, compared to 1000 replications for SBatch. During our experimentation with this test problem, we have occasionally observed cases where the required sample sizes were extremely large, and it is possible that 400 replications is not sufficient to see this behaviour. For example, if we take just the first 400 replications of nominal 90% CIs generated by SBatch for the 7.5% precision case, we get a CI coverage of 84.5%, an average data size of 289 474, and a maximum data size of 3 271 680. These results are comparable to those for ASAP3, where the maximum data size required by ASAP3 for the 7.5% precision level was 2 873 344. For 1000 replications of the same test case, SBatch required a maximum sample size of 7 932 928 observations.
6. Conclusions and recommendations
SBatch is a completely automated procedure for steady-state simulation analysis, delivering an approximate CI estimator for the mean response that satisfies user-specified precision and coverage-probability requirements. Compared with its predecessors ASAP3 and WASSP, SBatch appears to have the following advantages:
- It is much simpler to understand and implement than either ASAP3 or WASSP.
- Its coverage probabilities compare favourably with those of ASAP3 and WASSP, especially in avoidance of significant undercoverage or overcoverage.
- Its required sample sizes are generally much smaller than the sample sizes required by WASSP while being roughly comparable to the sample sizes required by ASAP3.
- Its required execution times are substantially smaller than the execution times required by the latest implementations of WASSP or ASAP3—at least in all the test problems examined so far.
Although all the experimental results obtained with SBatch appear to be promising, the theoretical basis for the procedure requires substantial development. Our current work on SBatch is focused on the following areas:
- Establishing the asymptotic validity of the final CI (9) delivered by SBatch,

provided that (26) can in fact be proved when SBatch is applied to some interesting class of simulation output processes. - Establishing the asymptotic sampling efficiency of SBatch, provided that such results can in fact be appropriately formulated and proved when SBatch is applied to some interesting class of simulation output processes.
- Extending the performance evaluation of SBatch to include other discrete-event stochastic systems that exhibit characteristics typical of different types of large-scale simulation applications.
As mentioned in item 2 above, the issue of the asymptotic sampling efficiency of SBatch requires some elaboration. If we have the 'ideal' situation in which the target output process {Xi: i=1, 2, ...} is stationary and Gaussian with the known steady-state variance parameter

(where
(n) denotes the usual sample mean of the first n observations), and if the series on the far right-hand side of (27) is absolutely convergent so that
X is well defined, then the nominal 100(1-
)% CI for
X,

(where z1-
/2 denotes the 1-
/2 quantile of the standard normal distribution), is asymptotically valid in the sense that its coverage probability tends to the correct level 1-
as the sample size n grows. It follows from (27), (28) that in this ideal situation, an efficient procedure (in the sense of Chow and Robbins, 1965; Nádas, 1969) for computing a 100(1-
)% CI for
X with relative precision r* will require a total sample of size

and as r*
0, the resulting CI,
, will have a coverage probability that approaches the correct limiting value 1-
. A parallel development yields nA*(H*)=z1-
/22
X/(H*)2, the sample size required to deliver an asymptotically valid 100(1-
)% CI for
X with absolute precision H* in the ideal situation that
X is known.
If NR(r*) (respectively, NA(H*)) denotes the sample size required by SBatch to deliver a nominal 100(1-
)% CI for
X with relative (respectively, absolute) precision r* (respectively, H*), then establishing the asymptotic efficiency of SBatch in the sense of Chow and Robbins requires showing that

provided that (29) holds when SBatch is applied to an interesting class of steady-state simulation output processes. At a minimum, some attempt at experimental verification of (29) should be carried out for the test problems examined in this article.
To provide more comprehensive evidence of SBatch's performance in practice as mentioned in item 3 above, we must apply SBatch to a suite of communication-, inventory-, queueing-, and production-system simulations having realistic levels of complexity, congestion, and workstation utilization that are typical of a broad class of steady-state simulation applications. Additional theoretical and experimental results, follow-up papers, and software for SBatch will be available on the website www.ise.ncsu.edu/jwilson.
References
- Chow YS and Robbins H (1965). On the asymptotic theory of fixed-width sequential confidence intervals for the mean. The Ann Math Stat 36: 457–462. | Article |
- Fishman GS and Yarberry LS (1997). An implementation of the batch means method. INFORMS J Comput 9: 296–310. | Article |
- Fox B, Goldsman D and Swain JJ (1991). Spaced batch means. Opns Res Lett 10: 255–263. | Article |
- Lada EK and Wilson JR (2006). A wavelet-based spectral procedure for steady-state simulation analysis. Eur J Opl Res 174: 1769–1901. | Article |
- Lada EK and Wilson JR (2007). SBatch: A spaced batch means procedure for simulation analysis. In: Henderson SG, Biller B, Hsieh M-H, Shortle J, Tew JD and Barton RR (eds). Proceedings of the 2007 Winter Simulation Conference. Institute of Electrical and Electronics Engineers: Piscataway, NJ, pp 463–471, http://www.informs-sim.org/wsc07papers/055.pdf, accessed 13 April 2008.
- Lada EK, Wilson JR and Steiger NM (2003). A wavelet-based spectral method for steady-state simulation analysis. In: Chick SE, Sánchez PJ, Ferrin D and Morrice DJ (eds). Proceedings of the 2003 Winter Simulation Conference. Institute of Electrical and Electronics Engineers: Piscataway, NJ, pp 422–430, http://www.informs-sim.org/wsc03papers/052.pdf, accessed 13 April 2008.
- Lada EK, Wilson JR, Steiger NM and Joines JA (2007). Performance of a wavelet-based spectral procedure for steady-state simulation analysis. INFORMS J Comput 19: 150–160. | Article |
- Maplesoft (2003). Maple 9 Learning Guide. Waterloo Maple Inc.: Toronto.
- Nádas A (1969). An extension of a theorem of Chow and Robbins on sequential confidence intervals for the mean. Ann of Math Stat 40: 667–671. | Article |
- Royston JP (1982). Algorithm AS 181: The W-test for normality. Appl Stat 31: 176–180. | Article |
- Royston P (1993). A toolkit for testing for non-normality in complete and censored samples. Statistician 42: 37–43. | Article |
- Royston P (1995). Remark AS R94: A remark on algorithm AS 181: The W-test for normality. Appl Stat 44: 547–551. | Article |
- Shapiro SS and Wilk MB (1965). An analysis of variance test for normality (complete samples). Biometrika 52: 591–611. | ISI |
- Steiger NM et al (2004). Steady-state simulation analysis using ASAP3. In: Ingalls RG, Rossetti MD, Smith JS and Peters BA (eds). Proceedings of the 2004 Winter Simulation Conference. Institute of Electrical and Electronics Engineers: Piscataway, NJ, pp 672–680, http://www.informs-sim.org/wsc04papers/081.pdf, accessed 13 April 2008.
- Steiger NM et al (2005). ASAP3: A batch means procedure for steady-state simulation analysis. ACM Trans on Model Comput Simul 15: 39–73. | Article |
- von Neumann J (1941). Distribution of the ratio of the mean square successive difference to the variance. Ann Math Stat 12: 367–395. | Article |
Appendices
Appendix: Justification for correlation-adjusted CI (9)
Steiger et al (2005) find that for adjacent (nonspaced) batch means (ie, s=0), if the batch size m is sufficiently large to satisfy (18), then the stochastic behaviour of the truncated, nonspaced batch means

is closely approximated by that of a stationary AR(1) process,

for j=1, 2, ..., where the autoregressive parameter 
(m) satisfies |
(m)|<1 and the residuals {
j: j=1, 2, ...} are i.i.d. normal with mean zero and variance
2
(m)[1-
2
(m)]. In this situation the ACF of the nonspaced batch means is given by

for lag ℓ=0,
1,
2,...
To compute an approximately unbiased estimator of
, we derive an approximation to the ACF of the spaced batch means based on the ACF (A.1) of the nonspaced batch means. If spacer size is an integral multiple of the batch size so that s=rm for some positive integer r, then it follows from (A.1) that

In view of (A.2), we make the key assumption that for any non-negative integer value of s, we can take r=s/m (a rational number) to obtain the approximate result

for ℓ=0,
1,
2,.... Letting 
(m, s)

(m, s)(1)=
(m)(m+s)/m and taking ℓ=1 in (A.3), we have 
(m)
m/(m+s)
(m, s); and inserting this latter result back into (A.3), we have

for l=0,
1,
2, ....
Thus the assumption that the nonspaced batch means constitute an AR(1) process coupled with the approximation (A.3) for any positive integer values of m and s yields the same type of exponentially decaying ACF for the spaced batch means as for the nonspaced batch means. Note, moreover, that the lag-one autocorrelation 
(m,s) of the spaced batch means depends on both the batch size m and the spacer size s.
It is easily proved that

and in the following analysis, we use the approximation

From (A.4), (A.5), and standard results for the AR(1) process, we have

for large k; and thus (A.6) is the basis for the sample estimator

Thus the half-length H of (9) is appropriate for a CI centered at
; and since
has the same mean and smaller variance than that of
, it follows that the CI (9) will have coverage no less than that of the CI
.
Acknowledgements
Partial support for our research was provided by National Science Foundation Grant DMI-9900164. We thank Christos Alexopoulos, David Goldsman, and Ali Tafazzoli for many enlightening discussions on the operation of SBatch. We also thank the anonymous reviewers for several suggestions that substantially improved the paper.
MORE ARTICLES LIKE THIS
These links to content published by Palgrave Macmillan are automatically generated.
RESEARCH
SBatch: A spaced batch means procedure for steady-state simulation analysisJournal of Simulation Article
Cross-trained versus specialized agents in an inbound call centre: a simulation-based methodology for trade-off analysisJournal of Simulation Article




