Skip to main content
Log in

Comparison of continuous and discrete representations of unobserved heterogeneity in logit models

  • Original Article
  • Published:
Journal of Marketing Analytics Aims and scope Submit manuscript

Abstract

Representing unobserved heterogeneity or taste variations in Marketing Analytic behavioral-choice analysis is receiving increasing attention in the estimation of consumer-choice modeling. The mixed logit (MXL) model, which incorporates random coefficients into the multinomial logit model, has been widely adopted for this purpose. The most commonly adopted method in this context is to assume that the random coefficient follows a continuous, unimodal distribution, and the parameters of the distribution as well as the other parameters for the model can be obtained using maximum simulated likelihood estimation. In this article, we refer to this method as the continuous mixed logit (CMXL) model. This method requires the a priori assumption that the distribution of the random coefficient is continuous and, usually, unimodal. One way to relax this assumption is to estimate the distribution nonparametrically, by assuming a discrete distribution with finite support. We refer to this approach as the discrete mixed logit (DMXL) model. Based on the DMXL model, we propose the mass-point MXL model as one alternative to the continuous-distribution assumption and compare its performance with the latent class logit model (LCLM), also part of the DMXL family. Either model can be used to represent unobserved heterogeneity with a discrete distribution in the parameter space. In this article, we conduct empirical analyses and compare the continuous and discrete representations of unobserved heterogeneity in logit models using simulated data with known parameters and real data with discrete choices. Analysis with simulated data provides insights on the ability to distinguish between continuous and discrete parameter distributions and a better understanding of the goodness-of-fit measures used in evaluating model performance with real data. From the simulation study, we find that when the data is generated from a normal distribution, the CMXL model with the unimodal-distribution assumption is preferred to the DMXL mode. From the real data analysis, we find that the CMXL model fails to recover heterogeneity that is identified by the DMXL model. In conclusion, we suggest that when estimating a random-coefficient MXL model, one should start with a CMXL model, but should not accept a ‘no heterogeneity’ conclusion without estimating a series of DMXL models using either the mass-point MXL model or the LCLM with different starting values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1

Similar content being viewed by others

Notes

  1. However, it limits the distribution to a small number of discrete points rather than a continuous distribution.

  2. When estimating with only one parameter following a discrete distribution, the MPMXL model and LCLM are identical.

  3. Another data set is generated with larger variance, β1~N(−6, 25), and the results are similar, except that the LCLM has three mass points. As it doesn’t provide more instructive conclusions than what is presented in the article, we omit it.

  4. If μ and σ2 denote the mean and variance of the underlying normal distribution, the mean and variance of the lognormal distribution are exp(μ+σ2/2) and exp(2μ+2σ2)−exp(2μ+σ2).

  5. Data provided by the New York Metropolitan Transportation Council (NYMTC).

References

  • Andrews, R.L., Ainslie, A. and Currim, I.S. (2002) An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity. Journal of Marketing Research 39 (November): 479–487.

    Article  Google Scholar 

  • Ben-Akiva, M. and Bolduc, D. (1996) Multinomial Probit with a Logit Kernel and a General Parametric Specification of the Covariance Structure. Working paper, Département d’Économique, Université Laval, Quebec, Canada.

  • Ben-Akiva, M. and Lerman, S.R. (1985) Discrete Choice Analysis, Cambridge, MA: MIT Press.

    Google Scholar 

  • Bhat, C.R. (1997) An endogenous segmentation mode choice model with an application to intercity travel. Transportation Science 31 (1): 34–48.

    Article  Google Scholar 

  • Bhat, C.R. (2001) Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research 35B (7): 677–693.

    Article  Google Scholar 

  • Bollen, K.A., Ray, S., Zavisca, J. and Harden, J.J. (2012) A comparison of Bayes factor approximation methods including two new methods. Sociological Methods and Research 41 (2): 294–324.

    Article  Google Scholar 

  • Burnham, K.P. and Anderson, D.R. (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33 (2): 261–304.

    Article  Google Scholar 

  • Chamberlain, G. (1978) Analysis of covariance with qualitative data. Review of Economic Studies 47 (1): 225–238.

    Article  Google Scholar 

  • Chintagunta, P., Jain, D.C. and Vilcassim, N.J. (1991) Investigating heterogeneity in brand preferences in logit models for panel data. Journal of Marketing Research 28 (4): 417–428.

    Article  Google Scholar 

  • Crouchley, R. (1987) Longitudinal Data Analysis (Collection of Articles). Farnham, UK: Gower.

    Google Scholar 

  • Dong, X. and Koppelman, F.S. (2003) Comparison of methods representing heterogeneity in logit models. Presented at 10th International Conference of Travel Behavior Research, Lucerne, Switzerland.

  • Greene, W.H. (2000) Econometric Analysis. Upper Saddle River, NJ: Prentice-Hall.

    Google Scholar 

  • Guadagni, P.M. and Little, J.D.C. (1983) A logit model of brand choice calibrated on scanner data. Marketing Science 2 (3): 203–238.

    Article  Google Scholar 

  • Heckman, J.J. (1981) Heterogeneity and state dependence. In: S. Rosen (ed.) Studies in Labor Markets. Chicago, IL: University of Chicago Press, pp. 91–139.

    Google Scholar 

  • Heckman, J. and Singer, B. (1984) A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52 (2): 271–320.

    Article  Google Scholar 

  • Horowitz, J. (1981) Sampling specification and data errors in probabilistic discrete-choice models. In: D.A. Hensher and L.W. Johnson (eds.) Applied Discrete-Choice Modeling. Appendix C. New York: Halsted Press, pp. 217–236.

    Google Scholar 

  • Jones, J.M. and Landwehr, J.T. (1988) Removing heterogeneity bias from logit model estimation. Marketing Science 7 (1): 41–59.

    Article  Google Scholar 

  • Kamakura, W.A. and Russell, G.J. (1989) A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26 (4): 379–390.

    Article  Google Scholar 

  • Kuha, J. (2004) AIC and BIC: Comparisons of assumptions and performance. Sociological Methods and Research 33 (2): 188–229.

    Article  Google Scholar 

  • Magidson, J. and Vermunt, J.K. (2001) Latent class factor and cluster models, bi-plots and related graphical displays. Sociological Methodology 31 (1): 223–264.

    Article  Google Scholar 

  • McFadden, D. (1978) Modeling the choice of residential location. In: A. Karlquist, F. Snickars and J. Weibull (eds.) Spatial Interaction Theory and Residential Location, Amsterdam, the Netherlands: North-Holland, pp. 75–96.

    Google Scholar 

  • McFadden, D. and Train, K. (2000) Mixed MNL models for discrete response. Journal of Applied Econometrics 15 (5): 447–470.

    Article  Google Scholar 

  • McQuarrie, A.D.R. and Tsai, C.-L. (1998) Regression and Time Series Model Selection. Singapore: World Scientific.

    Book  Google Scholar 

  • Niederreiter, H. (1992) Random Number Generation and Quasi-Monte Carlo Methods. SIAM CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia, PA: SIAM.

    Book  Google Scholar 

  • Reader, S. (1993) Unobserved heterogeneity in dynamic discrete choice models. Environment and Planning A 25 (4): 495–519.

    Article  Google Scholar 

  • Rossi, P.E., Allenby, G.M. and McCulloch, R. (2005) Bayesian Statistics and Marketing. New York: Wiley.

    Book  Google Scholar 

  • Sándor, Z. and Train, K. (2002) Quasi-random Simulation of Discrete Choice Models. Working Paper, Department of Economics, University of California, Berkeley.

  • Schwarz, G. (1978) Estimating the dimension of a model. The Annals of Statistics 6 (2): 461–464.

    Article  Google Scholar 

  • Train, K. (2000) Halton Sequences for Mixed Logit. Working paper, Department of Economics, University of California, Berkeley.

  • Train, K. (2003) Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  • Weakliem, D.L. (2004) Introduction to the special issue on model selection. Sociological Methods and Research 33 (2): 167–187.

    Article  Google Scholar 

  • Wedel, M. et al. (1999) Discrete and continuous representations of unobserved heterogeneity in choice modeling. Marketing Letters 10 (3): 219–232.

    Article  Google Scholar 

  • Wen, C.-H. and Koppelman, F.S. (2001) The generalized nested logit model. Transportation Research B 35 (7): 627–641.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojing Dong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Koppelman, F. Comparison of continuous and discrete representations of unobserved heterogeneity in logit models. J Market Anal 2, 43–58 (2014). https://doi.org/10.1057/jma.2014.5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/jma.2014.5

Keywords

Navigation