Article

Journal of Simulation (2008) 2, 29–40. doi:10.1057/palgrave.jos.4250035

On the interaction between stratification and control variates, with illustrations in a call centre simulation

P L'Ecuyer1 and E Buist1

1GERAD, CIRRELT and DIRO, Université de Montréal, Canada

Correspondence: P L'Ecuyer, Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montreal, Quebec, Canada, H3C 3J7. E-mail: lecuyer@iro.umontreal.ca

Received 13 March 2007; Accepted 19 October 2007.

Top

Abstract

Variance reduction techniques (VRTs) are often essential to make simulation quick and accurate enough to be useful. A case in point is simulation-based optimization of complex systems. An obvious idea to push the improvement one step further is to combine several VRTs for a given simulation. But such combinations often give rise to new issues. This paper studies the combination of stratification with control variates. We detail and compare several ways of doing the combination. Nontrivial synergies between the two methods are exhibited. We illustrate this with a telephone call centre simulation, where we combine a control variate with stratification with respect to one of the uniform random variates that drive the simulation. It turns out that using more information in the control variate degrades the performance (significantly) in our example. This seemingly paradoxical behaviour is not rare and our theoretical analysis explains why.

Keywords:

variance reduction, control variates, stratification, call centres

Top

1. Introduction

The use of simulation to optimize decision parameters in complex stochastic systems is increasingly frequent. This simulation-based optimization typically requires thousands or millions of simulation runs for a complex model, where each run takes a significant amount of time. Consider for instance a telephone call centre for which we want to optimize the number of agents who talk with customers over the phone, and the working schedules of these agents, under constraints on the quality of service and on admissible schedules. Large call centres are complex stochastic systems that can be analysed realistically only by simulation; tractable queuing models oversimplify reality and are not very reliable. When simulation is combined with an optimization algorithm, simulation speed is a key issue because optimization often requires huge numbers of simulation runs at different parameter settings (Atlason et al, 2004; Cez cedilik and L'Ecuyer, 2008). In that context, straightforward (or naive) Monte Carlo simulation is often too slow to be practical.

Fortunately, proper use of variance reduction techniques (VRTs) such as control variates (CVs), stratification, conditional Monte Carlo, common random numbers, importance sampling, etc., can improve simulation efficiency, sometimes by a large factor (Bratley et al, 1987; Fishman, 1996; Glynn, 1994). For larger improvements, an obvious idea is to use two or more VRTs at the same time. However, this often complicates things in an unexpected way. Such combinations are studied in Cheng (1986), Booth and Pederson (1992), Avramidis and Wilson (1996), and Hickernell et al (2005), for example, in specific settings.

The aim of this paper is to examine some issues that arise when combining two specific VRTs and to show how to handle these issues. We do this via an example of a call centre simulation, to make things more concrete for the reader, but our development applies more generally. We study the combination of CVs with stratification with respect to a continuous input variable. In this case, the optimal CV coefficient turns out to be a function of the input variable on which we stratify. We focus on how to approximate this function in practice.

The next section discusses how stratification with respect to uniform random numbers driving the simulation can be used to reduce the variance. We then study the combination of a CV with stratification, which is non-standard and requires some care. Then, we give an example of a simple call centre on which we perform numerical experiments to compare the various ways of making the combination. The simulations were made with contact centres, a specialized simulation tool for ContactCentres (Buist and L'Ecuyer, 2005) developed in Java with the SSJ library (L'Ecuyer and Buist, 2005). A preliminary version of this paper was presented at the 2006 Winter Simulation Conference (L'Ecuyer and Buist, 2006).

Top

2. Stratification

Stratified sampling consists in partitioning the set of possible outcomes in a finite number of strata, estimating the quantity of interest separately in each stratum, and computing a weighted average of these estimators, where the weights are the (known) probabilities of the corresponding strata, to obtain the overall estimator. This is easy to implement if we can design strata for which we know the exact probabilities and from which we know how to generate samples uniformly. Bratley et al (1987, p 295) give an example with three strata. For large and complex simulations, it may not be obvious a priori how to achieve this. One way of stratifying a simulation is as follows.

Recall that all the randomness in a simulation typically comes from a sequence of independent U(0, 1) (uniform over the interval (0, 1)) random variates. Select d of those uniforms, preferably some whose values are deemed to have a large impact on the result. Partition the d-dimensional unit hypercube [0, 1)d into k rectangular boxes of the same shape and size; these boxes will correspond to the k strata. Each one has probability 1/k. To generate a sample uniformly from stratum s, we generate a point U uniformly in box s and take the d coordinates of U as the values of the d selected uniforms. All other random variates in the simulation are generated as usual, independently of the realizations of the d selected uniforms.

Suppose each simulation run provides an estimator X for Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. Suppose also that we have ns observations in stratum s for each s, where the ns's are positive integers such that n=n1+...+nk. If Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author denote the ns i.i.d. copies of X in stratum s, the (unbiased) stratified estimator of mu is (Cochran, 1977):

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

is the sample mean in stratum s. Let sigmas2=Var[X|S=s], the conditional variance of X given that we are in stratum s. Then,

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

and an unbiased estimator of this variance is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where ital sigma circs2 is the sample variance of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, assuming that nsgreater than or equal to2.

Stratification with proportional allocation takes ns=n/k for all s. Then, (2) simplifies to

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where X macrsp, n denotes the corresponding version of (1). The optimal allocation, which minimizes the variance (2) with respect to n1, ..., nk under the constraints that ns>0 for each s and n1+...+nk=n for a given n, is easily found by using a Lagrange multiplier; we must take ns proportional to sigmas/k: ns*=nsigmas/sigma macrk where sigma macr=Sigmas=1ksigmas/k. (We neglect the rounding of ns* to an integer and assume that ns>2.) If X macrson denotes the estimator with optimal allocation, we have Var[X macrso, n]=sigma macr2/n. Putting these pieces together, the variance can be decomposed as follows (Cochran, 1977):

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The first sum in the last line represents the variability due to the different standard deviations among strata and the second sum represents the variability due to the differences between stratum means. Proportional allocation eliminates the last sum while optimal allocation eliminates the first. For a given total sample size n, a larger k generally gives more variance reduction, because the strata are smaller so there is less variability within the strata. When k right arrow infinity, we have sigma macr right arrow integral[0, 1)d sigma(u)du where sigma2(u)=Var[X|U=u]. Usually, sigma macr>0, in which case the marginal variance reduction converges to zero. On the other hand, with a larger value of n/k (a smaller k), we have a more accurate estimator of the variance of the stratified estimator.

Top

3. Combining with a control variate

CVs for simulation are discussed, for example, by Lavenberg and Welch (1981) and Glynn and Szechtman (2002). Here we study how to combine a CV with stratification. To keep the notation simple, we now assume that d=1 and we consider a single control variable, but all our development can be generalized easily to d>1 and to a vector of CVs. The one-dimensional uniform random vector U is denoted by U. Any random variable A whose expectation Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is known can be used as a CV. Preferably, A should be strongly correlated (positively or negatively) with X. Without the stratification, the CV is used by subtracting from the original estimator X the difference Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author multiplied by some constant coefficient beta. The (unbiased) CV estimator is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The optimal coefficient beta is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

and we have Var[Xc]=(1-rho2[XA])Var[X] when beta=beta*, where rho[XA] is the linear correlation between X and A. This beta* can be estimated from preliminary (pilot) simulation runs or from the same runs as X; in the latter case, this gives a slightly biased estimator, but the bias is negligible when the number n of runs is large.

Things become somewhat more complicated if we combine the CV with stratification, because both beta* and the expected value of A generally depend on the strata, or on the value taken by the random variate on which we stratify. We examine and compare various ways of handling this, assuming that we are stratifying on U as in the previous subsection. We can apply the CV on the stratified average X macrsn, or on each stratum average mucircs, or on the individual observations Xsi. All these methods are equivalent to replacing Xsi by

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

with different choices of bsi and esi, where Asi is the value of the CV for the observation Xsi, in stratum s. Let Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, the expected value of A given that we are in stratum s, and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, the expected value of A conditional on U=u. In (8), when U=usi, we can take esi as either aas or a(u). We can also take bsi as either a common constant beta, or a different constant betas in each stratum s, or a function of u, beta (u). We examine and compare these possibilities.

If bsi does not depend on more information than esi, then (8) is unbiased; otherwise it can be biased. So if esi=as, we cannot take bsi=beta(U), whereas if esi=a, we must have bsi=beta (a constant). To show unbiasedness, we take the conditional expectation given the stratum s if esi=as and given U if esi=a(U). For example, if esi=a(U) and bs, i=betas, then Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, but this no longer works if we take as together with beta(U).

Table 1 summarizes the different combinations. Each table entry gives the 'correction term' bsi(Asi-esi) used in (8) for the given combination. The dashed entries correspond to biased estimators. On each row, the best estimator is the one on the diagonal. As we shall see later, none of these three diagonal entries always give a smaller variance than the other two, even if we use the optimal CV coefficient in each case. We will examine each of them in more detail.


A common coefficient beta, with esi=as. We define A macrsn as the weighted average of the n replicates of A, in the same way as X macrsn in (1):

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Then, Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. Using A macrsn as a CV with a single coefficient beta and esi=as gives the estimator

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

For the choice of beta, a first (naive) approach is to use the beta* defined earlier, as if there was no stratification. However, this beta* is no longer optimal, as we now show.

The estimator (9) has variance

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Differentiating with respect to beta and equaling the derivative to zero, we find that the variance is minimized by taking

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Here, Cov[Xs, i, Asi] and Var[Asi] are the conditional covariance and variance given that we are in stratum s. Our second combined estimator uses (9) with beta=betasc*. This betasc*. generally differs from beta* and it also depends on the allocation used for the stratification. As a result, minimizing Var[X macrsc,n] requires finding betasc*. and the optimal allocation (the ns's) simultaneously (which is not necessarily easy). If we restrict ourselves to proportional allocation, the ns's simplify and we obtain

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Taking bsi=betas with esi=as. We now consider a different CV coefficient betas in each stratum. We replace Xsi by Xscsi=Xsi-betas(As, i-as) for each s and i, so the average mucircs is replaced by

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where Âs is the average of the Asi's in stratum s. We assume that we can compute

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

with negligible error (eg by numerical integration). The variance in stratum s becomes

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

and the optimal betas for stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

With betas=betasc, s*, the variance in stratum s is reduced to

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The overall variance here with betasc, s* cannot be larger (and is usually smaller) than if we impose betas=beta for all s, because we have the flexibility to optimize the constant betas in each stratum. The difference can be large if the betasc, s* are far from being equal between strata. After estimating the variance with betasc, s* within each stratum, we can find the allocation that minimizes the overall controlled variance. Note that the optimal allocation with the CV, obtained using the sigmasc, s' s, differs from the optimal allocation without CV, obtained with the sigmas's defined earlier.

Taking esi=a(u). Once we know usi, the realization of U for the observation Xs, i, we can take esi=a(us,i) instead of as. The CV coefficient bsi can be any of the three possibilities: beta, betas, or beta(u). Clearly, the more flexibility we have, the better we can do, so an optimal choice of beta(u) (a function of u) is always at least as good as an optimal choice of betas (a function of s), and the latter is always at least as good as an optimal beta (a single constant). Note that the optimal values of these coefficients are not the same (in general) with esi=a(u) than with esi=as.

Let C(u)=A-a(u). Suppose the CV coefficient can be a function of U, bsi=beta(U). Let sigma2(u)=Var[X|U=u], Xsc(u)=X-beta(u)C(u) (the controlled estimator conditional on U=u),

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

(its conditional variance), Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, and let Usi denote a random variable uniformly distributed over [(s-1)/ks/k). The variance of the controlled estimator in stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The choice of beta(u) affects only the first term in (17), that is, the expectation of the conditional variance. The optimal allocation takes ns proportional to (Var[Xsc(Usi)])1/2. Regardless of the allocation, the variance of the CV estimator is minimized by taking beta(u)=betasc*(u), where

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

With this optimal coefficient, the variance in stratum s is reduced to

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

If we impose the additional constraint that beta (u) must be a constant betas within each stratum, we have sigmasc2(u)=sigmasc, s2(u)=Var[X-betasC(u)|U=u] and the optimal betas for stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Here the CV estimator is unbiased and the last equality holds because Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. Obviously, with this additional constraint, we cannot get a smaller variance than with betasc*(u). And by imposing betas=beta for all s, we can only do worse.

In practice, the function betasc*(u) can be approximated by approximating the two functions q1(u)=E[C(u) X|U=u] and q2(u)=E[C2(u)|U=u]. These functions can be estimated from a sample {(UiCiXi), i=1, ..., n} of n realizations of (UC(U), X), and fitting a curve q circ1 to the points (UiCi(Ui)Xi) and another curve q circ2 to the points (UiCi2(Ui)). For example, we can fit a polynomial by interpolation or by least squares, or use a smoothing spline (de Boor, 1978).

To determine the optimal allocation, we need a good approximation of Var[Xsc(Usi)] for each s. This requires approximations of the functions sigmasc2(u), mu(u), and mus. Since mu=integral01 mu(u)du=Sigmask=1 mus/k, this demands more information than estimating mu. A possible shortcut might be to just use the variance estimates and the optimal allocation for the case where the CV coefficient is constant in each stratum. In practice, this should rarely introduce a significant error, especially when k is large.

Is esi=a(u) always better than esi=as? For esi=a(u), we have an ordering between beta*, betas* and beta*(u) in terms of variance reduction; we know that more flexibility in the choice of CV coefficient can only decrease the variance. But is a(u) with beta*(u) always better than as with betas*? At first sight, one might think yes, because a(u) exploits more information than as (Cs=A-as is the conditional expectation of C(U)=A-a(U) given that we are in stratum s). But on closer examination, we find that using a(u) might sometimes do worse! The following counterexample, suggested by Roberto Szechtman (private communication), shows that with the optimal CV coefficients, Var[Xsc(Usi)] can be either larger or smaller than Var[Xsc, si].

Example 1

Suppose X=A. With betas=betasc, s*=1, we get Xsc, si=as, so Var[Xsc, si]=0. On the other hand, Var[Xsc(Usi)]=Var[X-beta(u)C(u)] and we also have betasc*(u)=1. With this coefficient, we have Var[Xsc(Usi)]=Var[a(Usi)]>0 whenever Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is not a constant inside stratum s. In this situation, Var[Xsc(Usi)]>Var[Xsc, si]. The larger the variation of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author inside the stratum, the larger the variance of the second CV estimator. So the second estimator has a larger variance when there are fewer strata. When the number of strata increases to infinity, the variance of the second estimator converges to zero, which makes sense because the two estimators are identical in the limit.

For an example where Var[Xsc(Usi)]<Var[Xsc, si], take X=A-a(U). Then, betasc*(u)=1 and Xsc(Usi)=0, which has zero variance, whereas Var[Xsc, si]>0.

To compare (14) with (17) in general, with the stratum-dependent CV and coefficient, the variance in stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

With esi=a(u) and the optimal coefficient function betasc*(u), the variance in stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The estimator with a(U) has a smaller variance than the one with as in stratum s if and only if (22) is smaller than (21). Comparing the corresponding terms of (21) and (22), we always have

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

but we may have Var[mu(Usi)-betas(a(Us, i)-as)]less than or equal toVar[mu(Usi)]. In our numerical example later in this paper, it turns out that sigmasc, s2<Var[Xsc(Usi)] for this reason. In fact, by comparing (16) and (19), we see that taking a(U) gives a smaller variance than as in stratum s if and only if Asia(Usi) is more strongly correlated with Xsi than Asias.

From a practical viewpoint, it is easier to estimate the constants betasc, s* for a few strata than fitting a continuous function betasc*(u). And sometimes, it even gives a smaller variance. This will be illustrated in our numerical examples. On the other hand, when the number of strata is large, fitting the function might be easier than estimating the numerous constants betasc, s*. In the limit when the number of strata goes to infinity, the two schemes converge to each other.

Top

4. Practical issues

We summarize the required steps to implement the combined methods discussed so far, focusing on the case where esi=as and bsi=betasc, s*. The other schemes are obtained via easy adaptations. In each case, there are actually many ways of implementing the procedure; some require pilot runs (eg to estimate the optimal allocation in the stratification, and to estimate the optimal CV coefficient independently of the production runs) and there is also more than one way of doing the pilot runs. In the preceding analysis, we took a one-dimensional uniform U and a single CV, but our development extends directly to d-dimensional vectors of uniforms and to vectors of CVs. If d>1, s becomes the index of a d-dimensional box, and the integrals in (13) and (17) are over this box instead of over the interval [(s-1)/ks/k]. If the CV is a vector, then beta is also a vector, the covariances become matrices and vectors, and the correlation in (16) and (19) is replaced by a coefficient of determination between Xsi and the CV vector (Glynn and Szechtman, 2002).

The combined variance reduction method can be applied as follows.

  1. Select d and define the d-dimensional boxes on which to stratify. Most often, d would not exceed 1 or 2. When d>1, the boxes can be narrower in the dimension(s) deemed more important.
  2. (Optional) Perform pilot runs to estimate the optimal allocation and optimal CV coefficients. See the discussion below.
  3. Perform the ns simulation runs in stratum s, for each s. With proportional allocation, ns=n/k for each s. Estimate each CV coefficient betasc, s* from these runs if this was not done via pilot runs.
  4. Compute the combined estimator. The variance within each stratum can be estimated in the standard way for CV estimators (Glynn and Szechtman, 2002). The overall variance is simply a weighted average of the variances within strata, given by (2). This variance estimate can be used to compute a confidence interval for the mean.

If we decide to skip the pilot runs in step 2, we can simply use proportional allocation for the stratification, and estimate the optimal CV coefficients using data from the production runs of step 3. This would introduce a bias, especially if we do this estimation separately for each stratum and if the ns's are small. In the latter case, the estimators of betasc, s* will also be noisy. When there are many strata, a good idea is to approximate Cov[Xs, iAsi|Usi=u] and Var[Asi|Usi=u] by smooth functions of u, as discussed earlier with the functions q1(u) and q2(u) and then integrate these approximations over each box to obtain estimates of the two terms Cov[XsiAsi] and Var[Asi] in (15). These smooth approximations can be obtained by least-squares fitting, for example.

The advantage of performing pilot runs in step 2 is to give an unbiased estimator. These pilot runs are simulation runs that are independent from those in step 3. They are used only to estimate the variances and covariances that determine the optimal allocation and optimal CV coefficients. This can be achieved via smooth approximating functions of u, as we just discussed. For a given total computing budget, skipping the pilot runs and using the entire budget for step 3 usually provides a smaller mean square error, despite the small bias.

How should we choose the uniforms on which we stratify, in practice? The idea is to pick one or two uniforms that have a large impact on the overall variance. We want to make the last term in (5) as large as possible. Our case study in the next section will give an illustration. As another example, suppose that our estimator is a function of the sample path of a Brownian motion {B(t), tgreater than or equal to0} over a given time interval [0, T]. Then we may use one uniform to directly generate B(T), a second uniform to generate B(T/2) conditionally on (B(0), B(T)), and then generate the rest of the path conditionally on these three values. L'Ecuyer and Lemieux (2000) explain how to do that. We can stratify on these two uniforms, perhaps using narrower intervals for the first one. These two uniforms already provide a rough sketch of the sample path, and they typically account for a significant fraction of the variance (see L'Ecuyer and Lemieux (2000) for further details on this).

The choice of A can be guided by the examination of (16) and (19): we want to maximize the squared correlations in these expressions. Interestingly, one referee suggested that it might be a good idea to stratify on the CV itself (or the uniform used to generate it). But this choice always gives zero variance reduction in (19), because a(Usi)=Asi in that case! The correlation in (16) is also likely to be small. What we should look for instead is a CV (scalar or vector) that is highly correlated with X and conditional on U. Intuitively, this CV should bring information relevant to the prediction of X in addition to what is already known from U.

Top

5. A simple model of a call centre

5.1. The Model

Telephone call centres, and more generally contact centres where mail, fax, e-mail, and Internet contacts are handled in addition to telephone calls, are important components of large organizations (Gans et al, 2003). To illustrate the VRT ideas in this paper, we consider a simple model of a call centre where agents answer incoming calls. Real-life call centres often receive different call types and have separate groups of agents with different combinations of skills that enable them to handle only a subset of the call types. To simplify the presentation, we assume a single agent type and a single call type, but the model is otherwise inspired by real-life centres. The techniques examined in this paper should behave in a similar way with more complex centres and other similar types of queuing systems.

Each day, the centre operates for m hours. The number of (identical) agents answering calls and the arrival rate of calls vary during the day; we assume that they are constant within each hour of operation but depend on the hour. Let nj be the number of agents in the centre during hour j, for j=0, ..., m-1. If more than nj+1 agents are busy at the end of hour j, calls in progress are completed but new calls are answered only when there are fewer than nj+1 agents busy. After the centre closes, ongoing calls are completed and calls already in the queue are answered, but no additional incoming call is taken.

The calls arrive according to a Poisson process with piecewise constant rate, equal to Rj=Blambdaj during hour j, where the lambdaj are constants and B is a random variable with mean 1 that represents the busyness factor of the day. We suppose that B has the gamma distribution with parameters (alpha0alpha0), that is, with mean Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Var[B]=1/alpha0. This type of arrival process model is motivated and studied by Whitt (1999) and Avramidis et al (2004).

Incoming calls form an FIFO queue for the agents. A call abandons (and is lost) when its waiting time exceeds its patience time. The patience time of calls are assumed to be i.i.d. random variables with the following distribution: with probability p the patience time is 0 (so the person hangs up if no agent is available immediately), and with probability 1-p it is exponential with mean 1/v. The service times are i.i.d. gamma random variables with parameters (alphaitalic gamma), that is, with mean alpha/italic gamma and variance alpha/italic gamma2.

For a given time period (an hour, a day, a month, etc) and a given threshold s0, the fraction of calls arriving during that period and whose waiting time is less than s0 seconds (including those who abandoned before s0 seconds) is called the service level for that period, whereas the fraction of calls having abandoned is called the abandonment ratio. The service level is widely used as a measure of quality of service in call centres. For certain types of call centres that provide public service, it is regulated by law: The call centre operators may be charged a large fine if their service level goes below a given target; for example, 0.80 over each month for s0=20 seconds.

Here, we estimate these performance measures over an infinite time-horizon, that is, on average over an infinite number of days. Let A be the number of arriving calls during the day, G(s0) the number of those calls waiting less than s0 seconds (including those who abandoned before s0 seconds) for a given threshold s0, and L the number of calls having abandoned. The expected number of arrivals during the day is Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. Its variance is Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. Define Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. These two quantities represent the steady-state service level, and abandonment ratio, respectively. Since a is known, here we will estimate only Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author.

We simulate the model for n days. For each day i, let Ai be the number of arrivals, Gi(s0) the number of calls who waited less than s0 seconds and Li the number of calls having abandoned. In what follows, we use Xi to represent either Gi(s0) or Li, and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author to represent any of the two performance measures. A standard (or crude) unbiased Monte Carlo estimator of mu is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

with variance Var[X macrn]=Var[Xi]/n. We can estimate Var[Xi] by the empirical variance and a confidence interval can be computed as usual, using the normal approximation.

For our numerical illustrations, we take the following parameter values, where the time is measured in seconds: alpha0=10, p=0.1, nu=0.001, alpha=1.0, italic gamma=0.01 (so the mean service time is 100s), and s0=20. The centre starts empty and operates for 13 one-hour periods. The number of agents and the arrival rate in each period are given in Table 2.


We stratify on the uniform random variate U used to generate the busyness factor B by inversion: B=FB-1(U). As a CV, we use the number A of arrivals during the day. The mean and variance of A are Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Var[A]=1660+16602/10=277220. The expected number of arrivals conditional on U=u is a(u)=aFB-1(u) and the expected number of arrivals given that we are in stratum s is

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where fB(b) is the density of B.

5.2. Variance estimates for different schemes

We perform a numerical experiment whose aim is to provide accurate estimates of all the terms in the variance decomposition (6) and other relevant constants, for each scheme. Instead of estimating these terms by the empirical variance of a few pilot runs, as we would normally do in an application, we did the following extensive (and more accurate) computations. We simulated 104 replications at U=uj=(j+0.5)/1000, for j=0, ..., 999. For each value of uj, we computed the busyness factor Bj=FB-1(uj), performed the runs, and then computed estimates of mu(uj), sigma2(uj), and Cov[XA|U=uj] based on the 104 runs, for each j. We then fitted a cubic smoothing spline to these points to obtain accurate approximations of the functions mu(u), sigma2(u), Cov[XA |U=u], and betasc*(u). Note that Var[A|U] can be computed exactly, since A has a known Poisson distribution conditional on u.

We integrated these functions numerically to approximate mus, sigmas2, Var[As,i], Cov[XsiAsi], betasc, s*, and betascp*, for each relevant pair (sk), as well as all other relevant constants such as musigma2, Cov[XA], and beta*. The value of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is known, but Var[A|S=s], the conditional variance of A given that we are in stratum s, must be estimated too. Based on these computations, we were able to compute all the numbers reported in Table 3 for the service level G(s0) and in Table 5 for L, the number of lost calls. We also have muapproximately1418.660 for G(s0) and muapproximately60.504 for L.


Figure 1 shows the behaviour of the optimal CV coefficient betasc*(u). for the estimation of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, as a function of u. This coefficient is decreasing in U and has the same sign as Cov[CX|U=u]. It is positive for small U and negative for large u. This can be explained as follows: when U is small, the load on the system is small and the agents are not very busy, so a small increase in the number of arrivals tends to increase G(s0), which makes the covariance positive. When U is large, on the other hand, the agents are occupied most of the time, so a few more arrivals increases the waiting time of several calls and tends to decrease the number of calls answered within s0 seconds, whence a negative covariance. Therefore, the CV must correct the estimator in a different direction depending on the value of u. When uapproximately0.99, the load on the system is so high that new arrivals do not affect much the number of calls waiting less than s0; in the limit, this number is zero (a constant) so Cov[XC|U=u] right arrow 0 while Var[C|U=u]=FB-1(u) right arrow infinity. Thus, betasc*(u) converges to 0 when u right arrow 1.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The function betasc*(u) for the number of calls waiting less than s0, approximated by smoothing cubic splines on 1000 points.

Full figure and legend (63K)

Figure 2 displays the function sigma2(u)=Var [X|U=u] as a function of u, again for the estimation of X=G(s0). When U increases (ie the arrival rate increases), the conditional variance first increases until it hits a sharp peak at uapproximately0.99, and then it decreases abruptly to zero. This decrease to zero is again due to the fact that when the arrival rate is too high (when U is too close to 1), practically no call is served within the time limit s0. The graph of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author as a function of u, in Figure 3, confirms this abrupt convergence of mu(u) to 0 when u right arrow 1. This function increases for U up to about 0.86, and then it starts to decrease.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The function sigma2(u) for the number of calls waiting less than s0, approximated by smoothing cubic splines on 1000 points.

Full figure and legend (69K)

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The function mu(u) for the number of calls waiting less than s0, approximated by smoothing cubic splines on 1000 points.

Full figure and legend (82K)

We made a similar experiment for the estimation of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, the expected number of lost calls. Figure 4 shows the behaviour of betasc*(u), which is always positive and increasing in this case (the correlation between L and A, conditional on u, is always positive). The functions sigma2(u), mu(u), and Cov[XA|U=u], have a very similar shape as betasc*(u).

Figure 4.
Figure 4 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The function betasc*(u) for the number of lost calls, approximated by smoothing cubic splines on 1000 points.

Full figure and legend (79K)

Tables 3 and 5 report the values of the different terms in the variance decomposition, as a function of the number of strata, k. The following five schemes are considered; they all use the estimator in (8), with some CV C=Asi-esi and coefficient bsi:

  1. no CV, only stratification (bsi=0);
  2. the CV C=A-as with constant coefficient bsi=beta*;
  3. the CV C=A-as with constant coefficient bsi=betasc*;
  4. the CV C=A-as with coefficient bsi=betasc, s* in each stratum;
  5. the CV C=A-a(U) with coefficient bsi=betasc*(U).

The values of k range from 1 to 1000. The extreme case of k=1 corresponds to no stratification. We also consider the limit when k right arrow infinity. The (almost) exact means can be computed via Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author; they are Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author (recall that the mean number of arrivals is 1660).

Note that scheme (1) with k=1 is the classical Monte Carlo estimator. Schemes (2)–(4) with k=1 are all equivalent and they correspond to using CV only, without stratification.

When k right arrow infinity, the variance with proportional allocation becomes Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, while the variance with optimal allocation converges to (integral01sigma(u)du)2, which provides a lower bound on the best that can be achieved by increasing k and using optimal allocation. The variance of the means and the variance of standard deviations across strata converge to Var[mu(U)]. and Var[sigma2(U)], respectively. The constant betascp* for scheme (3) converges to the ratio Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. The two CV schemes (4) and (5) are equivalent in the limit, as discussed earlier.

The variation of the means across strata, (1/k)Sigmas=1k(musmu)2, depends only on k and not on the CV scheme; it is given in the second line of the table. It increases with k, first very quickly and then slowly. This term represents the variance that is eliminated by doing stratification with proportional allocation, compared with no stratification at all; see Equation (5).

The term (1/k)Sigmas=1k(sigmassigma macr)2, which represents the gain of optimal allocation over proportional allocation, usually first increases with k for k up to 2–10 (depending on the scheme), and then decreases with k. One exception to this is scheme (4). The decrease is important for some of the schemes (eg (1), (2), (5)) and less important for others. This decrease could be explained intuitively by the fact that the number of strata increases, the variances within the strata (the sigmas's) tend to get smaller and so their variation decreases.

The variance of the stratified estimators decreases with the number of strata for all the schemes and both types of allocations (proportional and optimal).

With scheme (1) (no CV), the stratification with proportional allocation reduces the variance per run from 77 246 to 8616 with k=10, and to 2354 when k right arrow infinity (in the limit). With optimal allocation, it is reduced further to 5701 with k=10 and to 2013 when k right arrow infinity. Thus, we gain by a factor of more than 38.

With schemes (2) and (3), the CV brings practically no additional gain to stratification with proportional allocation. For scheme (2), it even increases the variance when k is less than about 100. This is explained by the fact that beta* (whose value is 0.390 here) is not really optimal for this scheme. The optimal coefficient betascp* (given in the table) depends on k and it becomes close to beta* (but not equal) when k right arrow infinity. With the optimal allocation, the CV gives some gain when k is large. But it also increases the variance when k is small in scheme (3); this is because the coefficient betascp* that we use is optimal only for the proportional allocation. Without stratification (k=1), these schemes reduce the variance by a factor of 2.

Scheme (4) gives the best results, with both proportional and optimal allocations. The performance is also good even for small values of k, which is quite interesting: there is no need to use a large number of strata (at least for this particular example). For this scheme, each coefficient betasc, s* is optimized to reduce sigmas independently across strata, and the CV works on both components of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. The sigmas's tend to be smaller and their variation is also smaller.

Scheme (5), in which the CV and its coefficient are functions of U, is not doing better than Scheme (4). When k right arrow infinity the two schemes become equivalent, so there is not much difference between them when k is large. But for small k, scheme (5) gives a much larger variance with both the proportional and optimal allocations. We found this result rather surprising at first. However, it can be explained by the fact that second term of (22) is much larger than that of (21) in some strata, in this example, when k is small. The last two columns of Table 4 compare these two terms, which represent the values of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author for the two schemes, in each stratum, for k=20. We see that the values are much larger for scheme (5) than for scheme (4), especially for the strata where U is close to 0 or 1. The term Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is smaller for scheme (5) than for scheme (4), but only by a very small amount.


The results for L, given in Table 5, are similar. In particular, scheme (4) (clearly) remains the best performer, especially for small or moderate k. Some minor changes are that here, scheme (2) does not increase the variance compared with scheme (1), and the variation of the standard deviations across strata is a decreasing function of k for all the schemes.


Top

6 Conclusion

We have studied how to combine stratification with respect to a few uniform random numbers that drive the simulation, with one or more CVs. Our variance analysis and empirical results have exhibited some unexpected behaviour in the combination. Among the different combination schemes that we have discussed, based on our analysis and experimentation, we recommend scheme (4), with a moderate value of k. If we prefer a large k, then the optimal CV coefficients should probably be estimated by approximating the variances and covariances by smooth functions of u, via least squares. In our empirical experiments with other examples, this scheme never performed much worse (and usually better) than the other schemes. Our detailed example provides insight by showing how the different variance and covariance components vary as functions of design parameters such as the number of strata, as functions of the uniform on which we stratify, and with the combination scheme. It also provides ideas on how to implement the method in practice.

Top

References

  1. Atlason J, Epelman MA and Henderson SG (2004). Call center staffing with simulation and cutting plane methods. Ann Opns Res 127: 333–358. | Article |
  2. Avramidis AN, Deslauriers A and L'Ecuyer P (2004). Modeling daily arrivals to a telephone call center. Mngt Sci 50(7): 896–908.
  3. Avramidis AN and Wilson JR (1996). Integrated variance reduction strategies for simulation. Opns Res 44: 327–346.
  4. Booth TE and Pederson SP (1992). Unbiased combinations of Nonanalog Monte Carlo techniques and fair games. Nucl Sci Eng 110: 254–261. | ChemPort |
  5. Bratley P, Fox BL and Schrage LE (1987). A Guide to Simulation 2nd edn. SpringerVerlag: New York, NY.
  6. Buist E and L'Ecuyer P (2005). A java library for simulating contact centers. In: Armstrong F, Joines JA, Steiger N and Kuhl ME (eds). Proceedings of the 2005 Winter Simulation Conference. IEEE Press: Pistacaway, NJ, pp 556–565.
  7. Cez cedilik MT and L'Ecuyer P (2008). Staffing multiskill call centers via linear programming and simulation. Mngt Sci, Forthcoming.
  8. Cheng RCH (1986). Variance reduction methods. In: Henriksen JO, Roberts SD and Wilson JR (eds). Proceedings of the 1986 Winter Simulation Conference. IEEE Press: Pistacaway, NJ, pp 60–68.
  9. Cochran WG (1977). Sampling Techniques, 2nd edn. John Wiley and Sons: New York, NY.
  10. de Boor C (1978). A Practical Guide to Splines. Number 27 in Applied Mathematical Sciences Series. Springer-Verlag: New York.
  11. Fishman GS (1996). Monte Carlo: Concepts, Algorithms, and Applications. Springer Series in Operations Research. Springer-Verlag: New York, NY.
  12. Gans N, Koole G and Mandelbaum A (2003). Telephone call centers: Tutorial, review, and research prospects. Manuf Serv Opns Mngt 5: 79–141.
  13. Glynn PW (1994). Efficiency improvement techniques. Ann Opns Res 53: 175–197. | Article |
  14. Glynn PW and Szechtman R (2002). Some new perspectives on the method of control variates. In: Fang K-T, Hickernell FJ and Niederreiter H (eds) Monte Carlo and Quasi-Monte Carlo Methods 2000. Springer-Verlag: Berlin, pp 27–49.
  15. Hickernell FJ, Lemieux C and Owen AB (2005). Control variates for quasi-Monte Carlo. Stat Sci 20(1): 1–31. | Article |
  16. Lavenberg SS and Welch PD (1981). A perspective on the use of control variables to increase the efficiency of Monte Carlo simulations. Mngt Sci 27: 322–335.
  17. L'Ecuyer P and Buist E (2005). Simulation in java with SSJ. In: Armstrong F, Joines JA, Steiger N and Kuhl ME (eds). Proceedings of the 2005 Winter Simulation Conference. IEEE Press: Pistacaway, NJ, pp 611–620.
  18. L'Ecuyer P and Buist E (2006). Variance reduction in the simulation of call centers. In: Nicol D, Fujimoto R, Lawson B, Liu J, Perrone F and Wieland F (eds). Proceedings of the 2006 Winter Simulation Conference. IEEE Press, Pistacaway, NJ, pp 604–613.
  19. L'Ecuyer P and Lemieux C (2000). Variance reduction via lattice rules. Mngt Sci 46(9): 1214–1235.
  20. Whitt W (1999). Dynamic staffing in a telephone call center aiming to immediately answer all calls. Opns Res Lett 24: 205–212. | Article |
Top

Acknowledgements

This research has been supported by Grants OGP-0110050 and CRDPJ-320308 from NSERC-Canada, a Canada Research Chair, and a grant from Bell Canada via the Bell University Laboratories, to the first author. The second author benefited from an Industrial Scholarship from NSERC-Canada and the Bell University Laboratories. The paper was written while the first author was at IRISA, in Rennes, France.