Paper

Journal of Targeting, Measurement and Analysis for Marketing (2007) 15, 103–112. doi:10.1057/palgrave.jt.5750036

How to evaluate campaign response — The relative contribution of data mining models and marketing execution

Tom Breur1

Correspondence: Tom Breur, XLNT Consulting — 'turning data into dollars', Langestraat 8-03, Tilburg 5038 SE, The Netherlands. Tel: +31 6 463 468 75; E-mail: tombreur@xlntconsulting.com

1runs consulting firm XLNT Consulting (www.xlntconsulting.com) dedicated to helping companies make more money with their data. His fields of interest span data mining, analytics, data quality, IT governance, data warehousing and business models.

Received 2 March 2007; Revised 2 March 2007.

Top

Abstract

Measuring campaign effectiveness is very important. After all, you need to measure to manage. Data mining has been introduced into mainstream marketing, and at the moment it is used most frequently to improve targeting. 'Closing the loop' is key in state-of-the-art database marketing. It means testing measuring–tweaking campaigns. Passes through these cycles are run at increasingly higher speeds. By manipulating both marketing execution and targeting, one attempts to increase response. Since these effects operate simultaneously, the influence they exert get mingled. As a consequence, measuring the effectiveness of campaigns is slightly more complicated. The author describes a comprehensive test-design to evaluate the relative contribution of marketing execution and data mining models in increasing response. As data mining models get reused, their effectiveness over time needs to be tracked. This framework includes both one-off evaluation and longitudinal monitoring of data mining models and marketing execution.

Keywords:

database marketing, list management, direct marketing, test design

Top

INTRODUCTION

The rise of CRM has led to many innovations in marketing practices. Often, the business focus has shifted from market share to customer share. In parallel with this, allocation of marketing budget is no longer seen as an expense, but rather as an investment in the relation with the customer. Therefore, these allocation decisions are subject to the same kind of scrutiny any other investment decision is subject to: what is the return on investment? There is a growing tendency for marketers to be held more directly accountable for the results of their marketing efforts and expenditures.

An important aspect of the new marketing paradigm is 'closing the loop'. 'Closing the loop' means that results of past marketing campaigns are fed back into the organisation, leading to a continuous learning process. New generations of Customer Relationship Optimisation and Marketing Automation tools enable running multiple campaigns in parallel. To achieve organisational learning, it is important to track results closely, and compare results across campaigns. In particular, when running many simultaneous campaigns, it becomes even more important to do this simply and consistently.

'Closing the loop' implies continuous adaptation of successive campaigns by learning from response in past campaigns. Best-practice database marketing is to use each consecutive campaign as an opportunity to improve on the offer as well as the targeting. Evaluation of past campaigns leads to new hypotheses, which are then tested in new campaigns, leading to new response data. These are then evaluated in the light of previous hypotheses, and so forth. In this way one 'closes the loop', and a continuous learning cycle is set in motion.

Driven in part by decreasing response to direct marketing campaigns, marketers are turning increasingly to the use of data mining techniques. Predictive data mining technology helps marketers provide more value to their customers by communicating the right offer to the right customer at the right moment. One could say that increased response percentages are the customer's implicit acknowledgment of higher relevance of the offer.

Now, when both the marketing offer (execution) and targeting (data mining model) are tested at the same time, an issue of confounding arises: should the higher response in the new campaign be attributed to an improvement of the marketing offer or to better targeting? How do we unravel these two? That is the question the author answers in this paper. To this end, provisions have to be made in the test design for the campaign. These provisions are not very complicated. They, however, do need to be considered prior to planning and executing the campaign. How the customer selection process should best be designed as well as which metrics to use are the subject of this paper.

In this paper, the author proposes a test design, together with the measures and statistics that will provide managers, marketers and decision-makers in general with a framework to help campaign planning and evaluation.

Top

TEST DESIGN — OVERCOMING CONFOUNDING

In order to deal with the issue of confounding, a test design needs to be used that systematically unravels the effects of marketing execution and targeting. Confounding consists of two effects (ie marketing execution and targeting) simultaneously exerting an influence on response. So if customers in the target group get an offer and the response rate is significantly higher than for customers who did not get the offer, how can we tell whether this should be attributed to the selection made (targeting effect), or because the marketing offer was so compelling (execution effect)? The answer to this problem lies in the use of control groups. We try to disentangle two effects; therefore, we need two control groups to achieve this.

Demonstrating the marketing execution effect

In this paper, the author will use a very broad definition of 'marketing offer'. It may consist of any effect we are trying to measure that can be attributed to actions that are under voluntary marketing control. This effect results in a difference in response between two otherwise equivalent customer groups when one of them undergoes some (marketing) treatment and the other does not. In this broad definition, the marketing offer could be a certain product with or without an introductory discount, a piece of direct mail, an outbound telemarketing call or a premium that is offered along with a product. Or a web banner, pop-up screen or some personalised internet page. But it can also be a composition of marketing efforts put together that form some elaborate treatment condition for a group of customers.

For the marketing offer, or execution effect, one needs to compare exactly equivalent customers who got the offer versus those who did not. The difference in response behaviour between these two groups then demonstrates the effect of the marketing offer. It is crucial that there be no systematic difference whatsoever between customers who got the offer versus customers who did not. If the offer is only made to customers who have, say, at least three products, one cannot compare this group with all other customers who did not get the offer. One could then never be sure whether the higher response in the target group is because customers with more than three products just like the company better, and therefore respond more often. But any other potential cause that is tied to this group owning three or more products could provide the explanation. In such cases, the a priori difference between these customers with more versus fewer than three products could be the explanation for the difference. As a matter of fact, the characteristics that explain differences in propensity to respond are exactly the kinds of 'nuggets of insight' one would like to learn from the evaluation after the campaign.

Since there is absolutely no way to know beforehand exactly what might influence response, it is essential that the allocation of customers to the treatment versus no-treatment condition is done by means of a deliberate randomising procedure. Of course, the same holds for offer versus no offer, premium versus no premium, etc. Randomisation is the only way to be absolutely sure there are no systematic effects 'hidden' in some way in the selection procedure of customers for the treatment. And these so-called 'hidden selection effects' can be tricky! Any customer file in a database is ordered in some way, for instance by account number. Historical procedures for allocating customer numbers may carry some kind of tenure effect. Even something as apparently 'harmless' as alphabetic ordering may contain some effect correlated with last names. Pyle1: 'The potential problem comes from the fact that very often, and frequently unbeknownst to the miner, the data in a source data set is in some sort of order' (p. 483). Thoughtful randomisation is the only way to go.

Demonstrating the targeting effect

When campaigns are enhanced through the use of data mining, the aim typically is to zoom the selection in on the customers that are most likely to respond. The general approach is to use implicit feedback from customers to develop a model. This feedback is displayed implicitly by historical response behaviour. What a data mining model does is to point to customers who are look-alikes of customers who responded in the past. Then, on the basis of this model, one selects customers who resemble previous responders: resemblance in the way these customers appeared prior to responding.2

The targeting effect that results from the use of a data mining model is embedded in the selection. In this context, selection refers to the fact that the target group is a deliberately chosen, biased sub-group from a larger population. Therefore, the targeting effect is unveiled by comparing the target group to the population at large. With the help of statistics, these comparisons can be made on the basis of samples. So one would typically draw a random (!) sample from the population to get an estimate of the response probability in the overall population. This random sample can then be compared with the target group, and the difference in response is a measure of the targeting effect.3

As an aside, instead of random sampling from the population, it is also possible to use a stratified sampling method. Since the distribution of response in the population is inferred from the model itself, and not some 'external' stratification variable, this makes this stratifying procedure statistically rather complex. And, of course, a weighting procedure is then needed to make group comparisons with this stratified sample of the population. In most practical Direct Marketing applications, the gain in efficiency is only marginal. Therefore, the benefits from stratifying will usually not justify the added complexity from this stratification procedure. A reference to such a stratification method can be found in Mayer and Sarkissian.4

Making test-design selections

For the sake of completeness, the author presents a recommendation on the steps in the selection process for test designs. The exact sizes of each group need to be determined using statistical power analysis. The way to calculate group sizes will be discussed in the section 'Statistics — power analysis'. In this section, we will only discuss the order in which groups are best determined, to make evaluating test-results as convenient as possible.

The selection process begins by determining the base population for a certain offer. Basically, this comes down to defining all possible prospects within the customer database that the company would be willing to sell the product to. There should be no a priori selection criteria employed here. Any customer who is eligible to take the product belongs to the base population.

After the base population is determined, the first step is to draw a random sub-sample from this base population. Then we use the data mining model to score all customers in the base population for their propensity to respond. This will make it possible to sort the database by model score. The cut-off point in this sorted file can be made on the basis of either explicit cost/yield considerations or on the basis of external considerations (typically budget constraints). Customers who fall above the cut-off point (segments 1 and 2 in Figure 1) can eventually wind up in the 'target group'. Within the subpopulation with propensity scores above the cut-off point however, first another random subsample is drawn. This group will be excluded from the offer even though, according to the model, these customers have the highest propensity to respond. These high-propensity customers that are withheld the offer are labelled the 'reference group'. This procedure ensures that the reference group has the exact same characteristics as the target group.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Target group and two control groups

Full figure and legend (39K)

The recommended order in the selection process is to first determine the random group and then the target group. This ordering could be reversed, but that would be impractical. In the end, essentially the same customers would be approached, and it only makes the evaluation tests computationally more complex, and therefore unnecessarily cumbersome.5

In Figure 1, the groups described so far are marked. Only customers in the grey boxes will get the marketing offer. For ease of explanation, we will segment the population into ten equal-sized groups (deciles). In this example, the first two deciles fall above the cut-off point. Equal-sized segments are, however, not required, as long as the customers are ordered according to their propensity to respond.

Let p be the fraction of respondents, T the target group, F the reference group and R the random group.

Then
p(T)-p(F)=marketing effect
and
p(T)-p(R)=targeting effect

Given that these groups have been selected, the following two tests can be performed. First, to check for the execution (marketing) effect, compare the target group with the reference group to show the extent to which the marketing offer was successful in eliciting response. Secondly, to check for the targeting effect, compare the response in the random group with the response in the target group. This will give an empirical estimate of the accuracy of the model.6

By making selections in this way, one can disentangle the effects of the model versus the marketing offer. Confounding is overcome and the effects of both marketing and targeting can be measured by making the appropriate group comparisons.

Top

TEST DESIGN — MODEL TRACKING

As the term generally is used, a predictive data mining model will predict future response on the basis of historical data. As market conditions and customer characteristics change, the accuracy of predictive models tends to decay over time. Therefore, the performance of a data mining model needs to be tracked when it is reused.

There is no common measure to compare the predictive accuracy of classification models. This is because no classification measure is universally better. What better means exactly, depends on whether one looks at false positives, false negatives, at what targeting depth the classifier should perform optimally and differing misclassification costs. Because this problem has several dimensions, no single evaluation number can always be best.

Having said this, a commonly used measure to compare the predictive accuracy of data mining models is by lift. We define lift here as the ratio of response in the target group over response in the random group. The higher this ratio, the better. One of the nice features of lift is that it can be used to make comparisons across algorithms (eg Neural Network, Nearest Neighbour, Regression, Decision Tree), each with their own statistical 'fitness measures'. This definition of lift, however, does have some drawbacks. The most important drawback is that lift is a (monotonically decreasing) function of targeting depth; hence one can only make direct comparisons at identical targeting depth. Also, lift is not invariant to random response. There is an inverse correlation between lift and random response. In particular when a very small proportion of the population is targeted, lift is higher for low random response rates. To overcome this problem, one can use a model evaluation measure that is independent of the overall response rate. Rosset et al.7 coined the term RNR (response to nonresponse ratio) for a measure with this property. The only drawback that RNR has is that this measure lacks the straightforward interpretation that lift holds.

Let pd be the fraction of respondents at targeting depth d

Lift: pd(T)/p(R)

Let N be the size of the population

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

More information on issues surrounding lift and fitness measures for classification models can be found in Piatetsky-Shapiro and Masand,8 Hand9 and Rosset et al.7

The lift of a model is inversely related to the percentage of the population that is targeted. If a larger percentage of the population is targeted, lift will inevitably drop. As a consequence, one cannot compare lift across campaigns if a different proportion (depth of selection) of the population has been targeted. Selection depth d is defined as the ratio of customers above the cut-off point divided by the number of customers in the base population. One should compare lift at equivalent targeting depth. So lift is then compared at proportion d (min {d1d2}) where d1 is the selection depth in campaign 1 and d2 is the selection depth in campaign 2, or, more generally, d (min {d1d2di, ..., dn}).

To make these tests, it is essential that a random group be selected alongside the target group on each consecutive run. One compares the ratio of p1(T)/p1(R) with p2(T)/p2(R). It is not appropriate to compare p1(T) with p2(T). Imagine the following situation: from one campaign to the next, the response in the target group drops. Without a random group, there is no way to determine whether this decrease in response should be attributed to deterioration of the model or whether the product has just become less attractive for the target population. Comparing random response rates in subsequent campaigns is the only way to measure a priori attractiveness of the marketing offer. The ratio of pi(T)/pi(R) at the appropriate targeting depth determines the predictive classification accuracy of a model, not just pi(T).

Top

TEST DESIGN — EVALUATION COSTS

There is always some cost involved in testing. These costs can be broken down into two components. There is complexity cost in dealing with a more elaborate campaign design. The author will make no attempt here to quantify these complexity costs, suffice it to acknowledge that this extra complexity needs to be managed. And then there are also opportunity costs, because a test design inevitably involves measuring the response of control groups with lower response rates than the target group. By determining the revenue minus cost for the case without a reference group, and subtracting from this the revenue minus cost for the case with a reference group, we can calculate the opportunity cost for the reference group.10 This is straightforward; the derivation will be given in Appendix A.

Let NF be the number of customers in the reference group, Y the yield per respondent, p(T) the response percentage in the target group, p(F) the response percentage in the reference group and C the marginal cost of making an offer (eg incremental price of one extra mail piece).

Then

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

This equation can give a negative outcome if the cost of making an offer is not justified by the yield from doing so. In the case of such an unpleasant surprise, one would have hoped for a reference group that was as large as possible!

The opportunity cost for the reference group was calculated in expression (1). Now we will have a look at the opportunity cost for the random group. The opportunity cost for the random group can be calculated analogous to the reference group. By determining the revenue minus cost for the case without a random group, and subtracting from this the revenue minus cost for the case with a random group, we can calculate the opportunity cost for the random group. Again, this is straightforward; the derivation is given in Appendix A.

Let NR be the number of customers in the random group, Y the yield per respondent, p(R) the response percentage in the random group and C the marginal cost of making an offer (eg incremental price of one extra mail piece).

Then

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Top

STATISTICS — ONE-OFF

What tests need to be performed when evaluating one-off campaigns, and which are the appropriate statistics to use? To begin with, the effects of both marketing execution and targeting call for a one-sided statistical test. The expected response in the target group can only be expected to be equal or higher than the reference and random group.

Regarding the performance of the data mining model, a one-sided independent samples t-test is appropriate. Here, the target group is tested against the random group. A t-test may be used for comparing both response percentages and, for instance, monetary amounts or any other variable that is measured at the interval level. In most practical cases however, campaign response will mostly be compared as response percentages. Either way, a one-sided independent samples t-test should be used. The difference between testing interval variables or response proportions comes in the measure for standard deviation. The formulas for a standard t-test are:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

For a difference in response rate (two proportions):

sT is the standard deviation in target group=p(T)(1-p(T))

sR is the standard deviation in random group=p(R)(1-p(R))

For a difference in means:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Note two important properties of these formulas. First, sigmad grows as a square root of the size of the groups. Increasing the group sizes four-fold will only decrease the value for sigmad two-fold. Secondly, when testing proportions, sigmad is highest at the 50 per cent level, and becomes very small as the response rate approaches zero. Therefore, relatively large groups are needed to test groups with very low response rates.

Top

STATISTICS — MODEL TRACKING

When a data mining model is reused, its performance tends to decay over time. Performance is defined here as the accuracy of the model in separating responders from nonresponders. The choice between continuing with a degrading model versus redeveloping a new model is an optimisation problem. The economic value of a model with higher lift can be easily computed by comparing the initial lift of the model11 with the current lift. The lift of the initial model will then be extrapolated to the current random response rate.

The difference in current p(T) and p'(T) as would be expected when the initial lift could be attained is the economic value of an improved model. Of course, this difference may be carried over in the future as well (degrading accordingly) for a cyclical campaign in similar fashion. In this calculation, a one-time additional opportunity cost for a random group that is large enough to develop a new model needs to be taken into account. Exactly how large this group needs to be can be extrapolated from the random response rate.

Top

STATISTICS — POWER ANALYSIS

In a campaign, evidence from the evaluation should be conclusive. At the same time, opportunity costs for control groups need to be reduced as much as possible. These are contradictory goals. There is a triangular relation between effect size, group size and significance level.

Business wise, the objective is to minimise the opportunity costs for measuring campaign effectiveness. We will try to balance this economic goal with the statistical need for conclusive evidence. Clearly, there is a trade-off here. Larger group sizes allow for stronger conclusions, but also result in higher opportunity costs. Determining appropriate group sizes is essentially a business choice. How small are the differences in response that one would want to be able to label as significant? Or, what chance are you willing to take that the conclusion of a difference in response between groups was accidentally reached because of a haphazard fluctuation? Or, how much uncertainty is acceptable when reaching the conclusion that there is no difference between groups? Following from the business needs, an effect size should be specified, typically a minimal effect that is worthwhile as a business opportunity to pursue. An effect size that has sufficient economic business value should typically be what one tries to determine with appropriate certainty.

To clarify matters, let us begin by making the decision matrix explicit. There are four possible decision outcomes, two of which lead to erroneous conclusions. These statistical errors need to be minimised. In statistical terms, the H0 hypothesis will be that there is no difference between groups. The H1 hypothesis will be that there is a significant difference between groups. Depending on the kind of test we are performing, either one- or two-sided, the hypotheses will be:

One-sided:
Two-sided:
(H0: group 1>group 2)
(H0: group 1=group 2)
(H1: group 1less than or equal togroup 2)
(H1: group 1not equalgroup 2)

Type 1 error is controlled by setting alpha to some pre-defined threshold. From this threshold follows a minimal effect size (eg difference between groups) that needs to be observed, given the group sizes. From this same threshold alpha immediately follows beta, because any observed effect smaller than this threshold results in not rejecting H0 (Figure 2).


Triangular relation between effect size, alpha and group sizes

When effect size is constant (the difference between group means), larger group sizes will lead to smaller significance levels (p-value or alpha). This means that the statistical conclusion to establish a significant difference applies with a higher degree of certainty. The larger the group, the more conclusive the outcome. When significance level is fixed, a smaller effect requires larger group sizes. If business stakeholders assert that a difference in response percentage of 1 per cent between the target group and the reference group should lead to the conclusion of statistically significant difference, this will require larger group sizes than in the case of a 3 per cent difference being discernible. In the third scenario of fixing group sizes, the significance level (p-value or alpha) is inversely related to the effect size (response difference between groups) we try to establish. If the difference in response is large, we will be more convinced this is not due to random fluctuation. In other words, the p-value or alpha for rejecting the hypothesis of no difference will be low: the chance is small that we reached this conclusion of a significant difference because of a statistical fluke. Inversely, when observing small differences between groups, the chance that our conclusion of different response is a statistical error will rise. Within the context of this paper, we will use statistical power analysis to determine appropriate group sizes, notably, the size of the random and reference groups. Statistical power can be defined as the chance of obtaining a significant result.

Determining appropriate group sizes

Ideally, this business choice should be reached on the basis of the statistics at hand. We can do this by showing how the chance of finding a significant effect changes as a function of the group sizes and the (assumed) true magnitude in difference between groups.

Deciding on the group sizes inevitably calls for making estimates of the expected difference in response rate. In the case of a recurring campaign, historical response rates can be used as predictors. For a first-time campaign, guesstimates may need to be used. In case of uncertainty over expected response rates, statistical power can be calculated under an array of response scenarios.

Except for comparing response in the target group with the random group, another possible application of power analysis is to track how the a priori attractiveness of an offer changes over time. A priori attractiveness of a product is measured by the random response rate. To test whether an offer is still as attractive as in the previous campaign, we need to compare response rate in the random group for two subsequent campaigns. In this case, we can make no assumptions as to the direction of the difference, so we need to use a two-sided significance test for this. The difference in response, either positive or negative, will again be tested by an independent samples t-test, this time two-sided. There are two reasons as to why this test has less power. For one thing, because this test needs to be applied two-sided, statistical power will be lower. Also since one compares two random groups, instead of a random group and the target group (which is larger), the t-statistic will have a much larger sigmad, and therefore lower power. The practical business consequence is that to keep group sizes within reasonable limits, one will need to settle for less certainty about the conclusions one is trying to draw.

An elaborate treatment on the issues surrounding statistical power analysis can be found in Cohen.12

Top

CONCLUSION

Both the marketing execution and data mining model cause an increase in response. This distinction is not trivial. Because we typically only want to run the marketing campaign on our 'best' customers, the increase in response that is caused by the specifics of the marketing offer, and the extent that this is attributable to selecting exactly the right target group, is confounded. To disentangle these effects, one needs to set up a test design to measure the effect of each in a single campaign run. Because there are two effects we are trying to disentangle, we need two control groups. The random group (R) is used to measure the targeting effect, and the reference group (F) is used to measure the marketing effect.

One needs to compare a group that 'has' the targeting effect (T) with a group that does not 'have' this effect (R). To make sure that the only systematic difference between these two groups is this targeting effect, the random group needs to be exactly that: purposely drawn at random. For the marketing effect, we compare two otherwise equivalent groups, by randomly drawing the reference group (F) from within the group of customers with the highest propensity to respond. We thus ensure that there are no systematic differences between the target group (T) and the reference group (F).

The reason for simultaneously testing these two effects with every campaign should become clear through an example. Imagine the following situation. Response in the target group (T) is much higher than response in the random group (R). Great! The data mining model has done a very accurate job of discriminating responders from nonresponders. Response in the target group (T), however, is only barely higher than the reference group (F). In a case like this, the data mining model may be very good at predicting which customers are likely to respond. Only when response rates between target and reference group differ substantially will the marketing effort be worthwhile, from a business perspective. In this example, where the response rate in the target group is much higher than the random group, we are predicting autonomous response quite accurately. Even though the lift of the model may be very good, the net added value of running the campaign is poor. Without this reference group test, the campaign would falsely be deemed a success. It is clear that only triggered (as opposed to autonomous) response is worth spending marketing dollars on. A similar case can be made for great execution and poor targeting. When response in the target group is much higher than in the reference group, the offering is evidently quite effective in triggering response. If, in this same campaign however, the difference between the target group and random group is small, our data mining model is not very accurate. Then the marketing campaign might be deemed a success, although the model has performed poorly, and consequently targeting was not very precise.

The underpinning of adequate group sizes calls for statistical power analysis. When planning a campaign, one tries to balance the contradictory goals of using as small control groups as possible, yet at the same time aiming to reach conclusive results from the campaign evaluation. The opportunity costs for the reference and random group can be calculated using equations (1) and (2). From these, the relation between opportunity costs at varying group sizes, and the accompanying alpha and beta can be displayed. Such quantitative underpinning to aid in decision making helps to arrive at optimal campaign designs. In the case of cyclical campaigns, the historical response figures can be used as estimates of future response. Assuming unchanged response, exact group size calculations can be made. If no sound estimates are available beforehand, a number of scenarios can be calculated beforehand and compared for plausibility.

This paper has put forward some basic design and evaluation methods. All calculations presented follow from elementary statistics. Together, they provide a framework with a simple goal: to make planning as well as evaluating marketing campaigns as simple as possible. The trade-offs have to do with reducing uncertainty when deciding about differences between groups. Incurring higher opportunity costs can reduce this uncertainty. Analysts should make the trade-offs between opportunity costs and uncertainty as clear as possible. The statistics in this paper are meant to translate essentially statistical conclusions in dimensions that are relevant to the business (response or revenues). By making these choices clear, the knowledge that is derived from running through campaign cycles ('closing the loop') can be enhanced. Thoughtful planning and design of multiple campaigns will result in an optimal mix of efficient targeting and optimal learning. More efficient targeting essentially serves to drive down the cost of acquisition. Learning about execution helps to make marketing more effective by increasing the total number of potential buyers.

Top

References

References and Notes

  1. Pyle, D. (2003) 'Business Modeling and Data Mining', Morgan Kaufman, San Francisco, CA.
  2. Interestingly, this addition of finding customers that look like the ones who responded, but specifically the way they looked prior to responding, has to do with a confounding effect, too. This prevents that cause and effect might get confounded. If one erroneously tries to find customers that look like the ones who have acquired the product (the way the appear after they responded) one can get disappointing results. Variables that are a consequence of product uptake and are used for prediction have been labelled 'leakers' (Berry and Linoff), or 'anachronistic variables' (Pyle). In such a case, confounding might consist of inadvertently interpreting the effect of taking the product as a cause (predictor) of the propensity to acquire the product.
  3. The ratio between these percentages is generally referred to as the lift of the model, a universal measure across algorithms to indicate the accuracy of the model.
  4. Mayer, U. and Sarkissian, A. (2003) 'Experimental design for solicitation campaigns', in Proceedings of KDD-03, AAAI Press, Menlo Park, CA, pp. 717–722.
  5. Calculating the target effect using the order recommended by the author is most convenient. Any other process would require some weighting calculation at evaluation time. Also, this procedure allows for straightforward comparison of random response rates over time as the campaign is rerun, regardless of changing targeting depth.
  6. Lift can be calculated here as the ratio between these two response percentages.
  7. Rosset, S., Neuman, E., Eick, U., Vatnick, N. and Idan, I. (2001) 'Evaluation of prediction for marketing campaigns', in Proceedings of KDD-01, AAAI Press, Menlo Park, CA, pp. 456–461.
  8. Piatetsky-Shapiro, G. and Masand, B. (1999) 'Estimating campaign benefits and modeling lift', in Proceedings of KDD-99, AAAI Press, Menlo Park, CA, pp. 185–193.
  9. Hand, D. (1997) 'Construction and Assessment of Classification Rules', Wiley, Chicester.
  10. Response in the reference group will be lower than in the target group. Although no marketing budget is spent on this group, this may still be seen as a loss. These opportunity costs are inevitable. If this is seen as prohibitive, there may, however, be a solution. Oftentimes, the marketing execution that was tested in the target group can be employed with this reference group at a later stage, after the tests have been run. In that case, the only cost is increased campaign complexity.
  11. This calculation rests on the simplifying assumption that if a new model were to be built today, its expected lift will be identical to the original lift of the current model. For 'normal' model degradence this seems reasonable, for changing realities and interrelations between central variables, this assumption is clearly questionable.
  12. Cohen, J. (1977) 'Statistical Power Analysis for the Behavioral Sciences', revised edn, Academic Press, New York, NY.
Top

Appendices

Appendix A: Opportunity costs for reference and random group

Opportunity cost reference group

First determine revenue minus costs for the case without a reference group; we will call this [A]. The case with a reference group will be denoted by [B].

Let X be the fixed cost of running a campaign (eg design of creative), C the marginal cost of making an offer (eg incremental price of one extra mail piece), Y the yield per respondent, NT the number of customers in the target group, NR the number of customers in the random group and NF the number of customers in the reference group.

Then

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Hence the opportunity cost for the reference group is given by

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Opportunity cost random group

First, determine revenue minus costs for the case without a random group; we will call this [C]. The case with a random group will be denoted by [D].

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Hence the opportunity cost for the random group is given by

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Extra navigation

.
ADVERTISEMENT
Henry Stewart Conference
Henry Stewart