Customer portfolio analysis: Crisp classification versus fuzzy classification – Based on the supermarket industry

Hiziroglu, Abdulkadir; Patwa, Jeeyan; Talwar, Vishal

doi:10.1057/jt.2012.5

Customer portfolio analysis: Crisp classification versus fuzzy classification – Based on the supermarket industry

Original Article
Published: 04 June 2012

Volume 20, pages 67–83, (2012)
Cite this article

Download PDF

Journal of Targeting, Measurement and Analysis for Marketing

Customer portfolio analysis: Crisp classification versus fuzzy classification – Based on the supermarket industry

Download PDF

Abdulkadir Hiziroglu¹,
Jeeyan Patwa &
Vishal Talwar

5356 Accesses
Explore all metrics

Abstract

Acknowledging that traditional matrix-form customer portfolio models that result in crisp clusters are clouded with ambiguity, we propose the use of fuzzy clustering in customer portfolio analysis. This has been done in order to assist managers in better understanding their overall customer portfolio and reducing the effect of descriptive indicators. Our approach is tested on a supermarket data set of 3076 customers and its results are compared with a conventional customer portfolio matrix. A qualitative and quantitative assessment of the categorization generated both by our fuzzy clustering approach and the conventional matrix-based crisp clustering has been carried out along the following parameters: substantiality and balance of portfolio. The results show that the use of fuzzy clustering yields more substantial clusters, as well as a more balanced portfolio of customers. Although a particular portfolio matrix has been chosen for this research, the approach proposed here could be modified for use with other portfolio matrices.

A brief review of portfolio optimization techniques

Article 15 September 2022

Abhishek Gunjan & Siddhartha Bhattacharyya

Customer profiling, segmentation, and sales prediction using AI in direct marketing

Article Open access 23 December 2023

Mahmoud SalahEldin Kasem, Mohamed Hamada & Islam Taj-Eddin

Machine learning techniques for credit risk evaluation: a systematic literature review

Article 01 April 2020

Siddharth Bhatore, Lalit Mohan & Y. Raghu Reddy

INTRODUCTION

Managing customer portfolios has been a critical issue for many organizations. Researchers have proposed many different portfolio matrices to allow better categorization of customers in order to enable organizations to make more informed resource allocation decisions. However, many of the existing models are found to be limited in handling uncertain, ambiguous and incomplete data. The fuzzy set theory is one approach to reduce this ambiguity, which requires the modelling of the descriptors using fuzzy terms.

The purpose of this article is to show that fuzzy clustering yields a more refined classification and maybe more useful to organizations than the matrix-form portfolio models-driven crisp clustering approach. To achieve this, a methodology rooted in the data-mining framework is followed where the data set is used to form crisp clusters based on the traditional matrix-form models and fuzzy clusters based on the fuzzy clustering approach. The critical issue involves finding evaluation criteria between clusters produced by the traditional matrix-form models and the proposed fuzzy clustering approach in order to draw conclusions on which method yields better results.

The study commences with a review of the existing literature. It then describes the methodology employed for our empirical study and presents the results. The results are analysed using substantiality and balance of portfolio as the main criteria. The analysis of data is not limited to a quantitative assessment. The implications for marketers or managers are also discussed. It is concluded that the fuzzy clustering approach proves to be more useful to practitioners in reaching a more accurate representation of their customer portfolio.

LITERATURE REVIEW

In line with the ascent of relational approaches in marketing, studies dealing with customer portfolios have been conducted for a couple of decades.¹ Kundisch et al ² provide a comprehensive definition of customer relationship management (CRM). According to Kundisch et al,² CRM centres on the valuation, selection, acquisition, retention and development of durable customer relationships with the objective of allocating limited resources in order to maximize the value of a company.³ Ittner and Larcker⁴ suggest that proactive management of customer relationships can increase profitability of the firm as a whole. This makes CRM very important both from a business and academic perspective and this is reflected in the plentiful research carried out on this topic.

One of the concrete tools of CRM is customer portfolio analysis. The generic term ‘Portfolio Analysis’ refers to the process of reviewing a group of investments usually with a view to make asset management or resource allocation decisions.⁵ In the consumer portfolio context, these investments or assets are the consumers of the focal company. Terho and Halinen⁶ define consumer portfolio analysis as an activity by which a company analyses the current and future value of its customers for developing a balanced customer structure through effective resource allocation to different customers or customer segments.⁷

Many studies dealing with customer portfolios have been conducted since the 1980s. Table 1 provides a summary of these studies. In line with contingency theory, some authors propose the need for different kinds of relationship marketing and management depending on the exchange context of the interacting firms.⁶ Thus, the context within which the particular model maybe relevant is also stated along with the variables that the model deals with. Empirical research conducted by different researchers to test the various models can also be found in the table. It should be noted that the traditional portfolio models allow a data point to be present only in one class at a time, and hence the name crisp classification. For example, in the well-known Boston Consulting Group Matrix, a particular business (data point) can either be a star or a cash cow or a question mark or a problem child. This table is adapted from Talwar et al. ⁸

Table 1 Summary of different customer portfolio models

Full size table

On the basis of the studies mentioned in Table 1, it can be noted that subjectivity and ambiguity of the different axes is a common problem while trying to operationalize most of the models. Gelderman¹⁷ finds the definition of the dimensions and constructs to be ambiguous in the portfolio models. Yorke and Droussiotis¹¹ go a step further to say that the customer portfolio models are limited to aiding visualization.

We propose the use of fuzzy clustering as a customer portfolio analysis tool that minimizes the ambiguity of the results. Fuzzy clustering is based on the fuzzy set theory that allows us to describe and treat imprecise and uncertain elements present in a decision-making problem.^{18, 19} It may provide an alternative and convenient framework for handling uncertain project parameters (for example: project value, cost and so on) while there is lack of certainty in data or even lack of available historical data.²⁰

The most important and common fuzzy clustering algorithm is fuzzy c-means, which was first proposed by Dunn²¹and then developed by Bezdek.²² It is based on an objective function and was the birth of all fuzzy clustering methods and was developed from the traditional k-means method.²³ These fuzzy c-means algorithms have been applied and studied in different areas and more details of the development of these methods can be found in Yang.²⁴ For the mathematical details, please refer to Bezdek²² and Dunn.²¹

The use of the fuzzy set theory is most prevalent in stock portfolio selection tools where it is used to reflect the vagueness and ambiguity of security returns when probability theory proves inadequate. This is evident in the multitude of research on the application of the fuzzy set theory in a financial stock portfolio context.^{25, 26} Apart from stock portfolio selection, the fuzzy set theory has been employed to represent uncertain or flexible information in many types of applications such as scheduling, engineering design and production management.²⁰ Wang and Hwang²⁰ apply fuzzy logic to the process of R&D portfolio selection. Lin et al ²⁷ talk about the application of the fuzzy set theory to evaluate different projects, for example, R&D,²⁸ IT²⁹ and Operations Management.³⁰ Some applications of fuzzy logic are also seen in the marketing area, specifically in product innovativeness³¹ and new product development and project selection.³²

The use of the fuzzy set theory in a purely customer relationship context has received very limited academic attention. Zumstein³³ does focus on the application of fuzzy classification in CRM, but its specific application to customer portfolio analysis is not studied. Lin et al ²⁷ propose a systematic approach that incorporates the fuzzy set theory in conjunction with portfolio matrices to assist managers in reaching a better understanding of the overall competitiveness of their business portfolios. Even though customer relationships are a part of the study, fuzzy logic is not applied directly to the customer relations context. Starting from the suggestion that customers are the firm's most important ‘assets’³⁴ and linking it to the applicability of the fuzzy set theory to the selection of a portfolio of financial assets or stock, it is reasonable to suggest that the fuzzy set theory can find application in Customer Portfolio Analysis. This study is unique as it demonstrates a direct use of fuzzy clustering in customer portfolio analysis. Second, it draws a comparison between traditional matrix-form clustering and fuzzy clustering from a marketing point of view and, in doing so, it uses specific comparison criteria of substantiality and balance of portfolio. No study can be found that mentions bases of comparison common to both fuzzy and crisp clustering. Third, it discusses the managerial implications that the more substantial clusters and more balanced portfolio of customers can have for an organization.

METHODOLOGY

Proposed research model

The study makes use of transactional data of 3076 supermarket customers of a UK-based supermarket chain over a period of 4 months. A clustering task is undertaken on these customers using two different approaches – crisp clustering or traditional matrix-form clustering and fuzzy clustering. Certain comparison criteria are set such as substantiality and balance of portfolio that enable a comprehensive comparison between the two sets of produced clusters in order to provide an indication about which clustering approach is more useful to marketers.

The study involves the use of the data-mining methodology. CRM necessitates management and analysis of large volumes of market and consumer data using complex data management infrastructure and technology. Data mining acts as a helpful tool for organizations to discover meaningful trends, patterns and correlation in their customer data. These then enable them to drive improved customer relationships and to decrease the risk of business operations.³⁵

The variables used are shown in Table 2.

Table 2 Variables used

Full size table

The details on the operationalization of each variable can be found in Appendix A.

Research questions and hypotheses

Before the clustering results can be compared, two main issues need to be addressed. First, it must be ensured that the data being used are not normally distributed, as this is a prerequisite for clustering. The specific research question that addresses this concern is:

1. Are the data that are obtained normally distributed?

Given that the data do not follow a normal distribution, the second concern is regarding the identifiability and distinguishability of the clusters from each other. It is necessary to check whether the clusters are identifiable and distinguishable in terms of the variables ‘Value’, ‘Cost’ and ‘ATUQ’. In other words, one needs to ascertain whether the average values of the variables are found to be different across the clusters. It must be noted that this is not a comparison between a fuzzy cluster and a crisp cluster but rather among the different crisp and fuzzy clusters. The research question addressing this issue is:

2. Can the clustering approach (crisp and fuzzy) produce clusters that are identifiable and distinguishable in terms of the variables?

Having addressed these issues, the study must deal with the primary issue of comparing the crisp clustering approach to the fuzzy clustering approach in order to see which one of the two is a more refined clustering technique from a marketer's point of view. This study draws comparison between the two approaches by using substantiality and balance of the portfolio as the criteria. The specific research questions concerning these are:

If the clusters exist, are the fuzzy clusters more useful to marketers than the crisp clusters?

1
Does fuzzy clustering produce more substantial clusters than traditional clustering?
2
Does fuzzy clustering lead to the formation of a more balanced portfolio than traditional clustering?

The hypotheses corresponding to the above-mentioned research questions are:

Hypothesis 1₀:

The data points show a bell-shaped distribution.

Hypothesis 2₀:

The crisp clusters are not identifiable and distinguishable in terms of the variables.

The fuzzy clusters are not identifiable and distinguishable in terms of the variables.

Hypothesis 3₀:

Fuzzy clustering does not produce clusters that are more substantial in terms of size and value under the two approaches.

Hypothesis 4₀:

Crisp clustering produces more balanced portfolio than fuzzy clustering.

Research procedure

The methodology follows six steps as illustrated in Figure 1:

Data Collection: The research sample used in this study is in the context of the retail industry. The data were procured from an online database and are collected from a supermarket chain in the United Kingdom.

Data Preparation: Three behavioural characteristics of customers – Recency (R), Frequency (F) and Monetary (M) value – were extracted along with the total number of unique products they buy. The necessary data were extracted using MySQL from the database.

Data Pre-processing: Data pre-processing refers to the process of getting rid of noisy data or missing values by filtering or deleting them and doing some transformation or normalization that is crucial before mining the data.

The pre-processing steps employed for the data with their explanations are shown in Figure 2.

For a more detailed explanation on the pre-processing process, refer to Appendix B.

For this study, a simple random sampling method was used to extract the sample instead of using the whole database. Approximately, 10 per cent of the database was used as the study sample. By means of a sampling code written using Microsoft SQL, a sample of 3076 customers was obtained for conducting the analyses. Three reasons justify this approach. First, the use of the entire database could have resulted in criticism of the study on the grounds of statistical validity. Second, it is believed that a solution achieved from using a certain sampling method can be as effective as using the whole database.³⁶ Third, using the entire database could have necessitated the use of high-powered computers, which were not available for this study.

Data Analysis – Matrix-form portfolio models: The crisp clustering exercise has been carried out using the RFM tool in the SPSS software. However, a slight modification has been made wherein instead of entering the RFM values as the variables, ‘value’, ‘cost’ and ‘ATUQ’ were entered.

Data Analysis – Fuzzy clustering: Fuzzy c-means is the clustering method that is used. In order to perform the analysis, NCSS Data software is used. The parameters used are given in Appendix C.

Comparison of Results using Substantiality and Balance of Portfolio: The criteria chosen for comparing the two sets of clustering results are substantiality and balance of portfolio. Below is a brief description of the methods used to assess substantiality and balance of portfolio.

Substantiality can be measured in terms of number of customers belonging to each cluster, as well as the total value that each cluster accounts for. Value is usually measured in terms of a profit or a revenue function. However, in this study, the total value that each cluster contributes is measured in terms of the variable ‘Value’, which is a sum of the RFM values. The reason behind this is that monetary value is not the only factor that supermarkets want to drive; they also want their customers to be frequent and regular customers. For the crisp clusters, it is easy to find the number of customers belonging to each cluster and the corresponding value that they contribute. However, to count the number of customers belonging to each fuzzy cluster, one must set a benchmark membership degree or alpha-cut. If the membership degree exhibited by a customer exceeds this alpha-cut, then he/she must be accounted for in that cluster. Thus, the number of customers in each cluster and the value contributed by each cluster is obtained for both crisp and fuzzy clustering approaches and a comparison is drawn to decide which technique results in more substantial clusters.

This test is based on a study by Hiziroglu.³⁷ However, it has been modified to suit the clustering context from the segmentation context where it has been earlier applied.

The objective is to determine which of the two clustering approaches – fuzzy and crisp clustering techniques – produce better-balanced portfolio of customers. Zumstein³³ uses the following formula to calculate the balance error in a portfolio:

where Norm (O _i|C ₁) is the normalized membership degrees of customer i with respect to cluster 1, OW_c1 is the optimal weight allocated to cluster 1, and BE is the balance error.

Even though this formula is specific to fuzzy portfolios owing to the presence of membership degrees as a variable, it has been used in our study to calculate the balance error for fuzzy and crisp clusters. For crisp clusters, the membership degree of a customer belonging to crisp cluster 1 is counted as 1 and those belonging to other clusters are represented as 0 while computing the first term of the equation. For the second term, membership degrees of customers belonging to crisp cluster 2 are counted as 1 and the others are represented by 0 and so on for the remaining terms. Our study uses a modified version of the equation used by Zumstein.³³ The above-mentioned formula is converted into an objective function that aims to minimize the balance error for the crisp clusters and calculates the optimal weights for each of the crisp clusters. Similarly, an objective function that minimizes the balance error for the fuzzy clusters is devised and the corresponding optimal weights for the fuzzy clusters are found. The constraints for these objective functions are that the optimal weights should be between 0 and 1 and the sum of the optimal weights should not exceed 1. Using the Microsoft Excel Solver, the minimized balance error and corresponding weights for both the crisp and fuzzy approach are found. Zumstein³³ states that an optimally well-balanced portfolio has a balance error equal to 0. The minimized balance error for the crisp portfolio and that for the fuzzy portfolio is compared to decide which approach yields a balance error closer to 0, and thereby allows the creation of a well-balanced portfolio.

RESULTS OF THE ANALYSES

This section presents the results of the analyses conducted on the data. A brief insight into the descriptive statistics is provided followed by the presentation and interpretation of the clustering results. The results are compared using substantiality and balance of portfolio as evaluation criteria and the managerial implications for more substantial clusters and a well-balanced portfolio are discussed.

Descriptive statistics

The descriptive statistics provide a better understanding of the empirical data that are used in the study. Some of the descriptive statistics are shown in Table 3.

Table 3 Descriptive statistics

Full size table

It can be seen that the mean values for the variables ‘Value’, ‘Cost’ and ‘ATUQ’ are 0.28023, 0.00246 and 0.14271, respectively.

A prerequisite for the clustering is that the data used must not be normally distributed. Hence, it is imperative to verify that the data employed in this study do not follow a normal distribution. The hypothesis for the first research question is: