INTRODUCTION

The collaborative web or Web 2.0 as it is addressed, has provided additional touchpoints for organizations, to increase interaction with consumers, hence aiding customer centricity. Corporate blogs are frequently utilized by organizations for increasing the perceived value of the consumers by showcasing organizational, product and brand related achievements, thereby enhancing the organization/brand image, aiding consumer learning and also serving as a useful medium to enhance the consumer–brand relationship. Usage of posts on corporate blogs to apprise consumers of new promotional campaigns and product launches is fast becoming a useful marketing and Customer Relationship Management (CRM) strategy. Consumers can use the comments feature to interact with the organization, express their opinions, voice dissent, provide feedback, post product or service related complaints and so on, thereby increasing consumer involvement. This provides opportunities for mining consumer-generated data for extraction of knowledge pertaining to customers and subsequent conversion of the same to actionable customer intelligence by capturing certain aspects of human behavior.

This opinion article addresses the research opportunities and business implications of extracting consumer related information for consumer profiling from the comments posted by consumers in response to organizational posts on a corporate blog. The consumer responses can be mined to gauge consumer sentiment by using relevant sentiment mining tools and to serve as a decision support system for better segmentation and response management. Data mining concepts of clustering and nearest neighbor technique, under the aegis of memory-based reasoning, can aid allocation of consumers to respective clusters and prediction of behavior of new consumers as and when they enter the system.

Our main contribution in this work lies in extracting ‘word level’ sentiment from a consumer comment and proving that individual sentiment bearing words can be used to understand a consumer's inclination to buy or his state of relationship with an organization. Scores obtained form Sentiwordnet, which is essentially the English wordnet with polarity scores attached to the synsets, are used for providing weights to the features, that is words of a comment.

CORPORATE BLOGGING

Web 2.0 is a collection of open-source, interactive and user-controlled online applications expanding the experiences, knowledge and market power of the users as participants in business and social processes.1 Web 2.0 tools also represent a significant opportunity for organizations to build new social and web-based collaboration, productivity, and business systems, and to improve cost and revenue returns.2 These tools of the collaborative web have found applications in the corporate sector in the domains of Marketing, Brand Promotion and Customer. Web 2.0 also appears to have a substantial effect on CRM behavior and on new challenges facing strategists and marketers.3 Corporate blogs, online communities, social networks, wikis, micromedia and folksonomies are some web 2.0 concepts being used by businesses.

The dictionary meaning of a blog is a frequent, chronological publication of personal thoughts and links. As millions of people use blogs as personal diaries on the internet, they are emerging as collaborative spaces that can be put to multiple uses and have emerged as the latest mode of computer mediated communication.4 This concept has found widespread acceptance in the corporate world with the emergence of ‘corporate’ or ‘organizational’ blogs. These are people who blog in an official or semi-official capacity at a company, or are so affiliated with the company where they work that even though they are not officially spokespeople for the company, they are clearly affiliated5 and endorsed explicitly or implicitly by the company. Also termed as a hybrid of the personal blog6, they are increasingly being explored by public relations practitioners and feature the insights, assessments, commentary, and other discourse devoted to a single company. Organizational blogs seem to appear at the intersection of personal reflection and professional communication. They have evolved from both online and offline modes of communication and have characteristics of both personal and professional communication.7 Posts in blogs are tagged with keywords, allowing for content categorization and also for gaining access to the content through tagging as a theme-based classification system. Linking is also an important part of the blogging activity as it deepens the conversational nature of the blogosphere and its sense of immediacy.8 An effective blog fosters community and conversation9, drives traffic to the product website and serves as a medium for interaction with consumers, thereby shaping consumer perception, eliciting responses, and through a two-way thought exchange process aids in fostering a connection with the consumers. Blogs have a comparative advantage of speedy publication – they have a first mover advantage in socially constructing interpretive frames for current events.10 Blogs are no longer a subculture of the Internet; they have become a mainstream information resource. External corporate blogs, are primarily tools used by organizations to interact with consumers, partners, marketing intermediaries, associates and components of the external environment viz. media, government agencies and other general bodies. They offer a more up-to-date view of the organization as compared with other traditional communication channels. Tapping into this new channel to listen to and interact with their customers requires new initiatives from corporations.11 They further provide a tremendous opportunity for forward-thinking companies and management to have a significant positive impact on their public perception. People who read organizational blogs perceive an organization's relational maintenance strategies as higher than those who read traditional web content only, thereby making a blog a useful tool for creating and maintaining value-laden relationships with current and potential customers. Launching a corporate brand blog is representative of an organizational desire to share information and engage in a conversation. This is especially true when the blog allows visitors to post their own comments. The informality of communication helps companies build trust12, converse with people and even manage public perception by posting suitable responses. The ability of a blog to induce consumer participation by making consumers comment on the posts hosted by the organization creates a dialogue and helps the organization achieve consumer engagement. These web-based interactions can aid in reducing the level of perceived indifference of a company, and at the same time reinforce a customer purchase decision, by offsetting the feeling of cognitive dissonance.13

While the ability of a blog to achieve higher volumes of engagement in terms of volume of comments is significant, of greater importance is the knowledge capital created through exchange with consumers, which can be mined to extract explicit information that can be leveraged by the organization as a decision support system14 for consumer segmentation and strategy formulation. The advantage of blogs is that posts and comments are easy to reach and follow because of centralized hosting and generally structured conversation threads. Currently, all major browsers support RSS technology, which enables readers to easily access posts without actually having to visit the blogs. From a blogging perspective, benefits to users, are social as well as informational, and connecting with their community is an important value sought by all types of users and heavy users of the system realize the greatest benefits.15 Corporate blogging is primarily about three attributes – information, relationships and knowledge management. Although there are many different types of corporate blogs, most can be categorized as either Internal or External. For the purpose of this study, we focus on External Blogs being used by organizations to build brand relationships with consumers and induce participation and engagement.

CRM

CRM, which has also been described as ‘information-enabled relationship marketing’16, is an enterprisewide initiative that belongs to all areas of an organization.17 It comprises processes used by organizations to manage consumer relationships, which also include collecting, storing and analyzing data, and is often termed as data – driven marketing. CRM attempts to provide a strategic bridge between information technology and marketing strategies aimed at building long-term relationships and profitability. This requires ‘information-intensive strategies’.18 It is vital to maintain appropriate Customer Information Management systems by acquiring customer databases and consolidating customer feedback.

Companies interact with customers, treat them as organizational assets, learn about them and through the process of incorporating feedback and cocreation, develop a level of intimacy with them. This serves the objective of better marketing investment prioritization as improving marketing intelligence will definitely aid firms in improving the selling context. Organizational processes need to change in a way that the organization can recognize individual customers and extract information on who they are and what they want.19 In this context, we attempt to use the concepts of knowledge extraction and data mining to extract consumer related information from comments posted by consumers on a corporate blog.

Knowledge discovery and data mining

Knowledge discovery is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns form large collections of data.20 Data mining is concerned with the actual extraction of knowledge from data.21 The web captures several aspects of human endeavors and provides a fertile ground for data mining, which is playing an important role in meeting the challenges of the intelligent web.22 In the case of a corporate blog, the purpose of the analysis would be to gain insight into the consumer thought process, thereby enabling prediction of consumer behavior, creating segments23 of consumers, identifying consumers at the risk of churn, analyzing responses to campaigns and retention strategies. Data mining applications perform the analysis and extract relevant consumer information.

Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing on methodologies for extracting useful knowledge from data for Business Intelligence. The ongoing rapid growth of online data due to the Internet24 and the widespread use of databases have created an immense need for KDD methodologies. The challenge of extracting knowledge from data draws on research in a wide variety of fields to draw on tools that can synthesize and organize knowledge on any given topic of interest from a corpus of documents/content. There is an increasing realization that effective CRM25 can be done only based on a true understanding of the needs and preferences of the customers. In this context, data mining tools can help uncover hidden knowledge in online content, thereby enabling better understanding of the consumer and a systematic knowledge management effort can channel the knowledge into effective marketing strategies.

Customer Profiling26 is one of the major areas of the application of data mining for knowledge-based marketing. This is of relevance because consumer behavioral data27 is a more valuable source of information than consumer demographic data. We make use of the following data mining techniques for our discussion.

Sentiment mining

Sentiment mining is a computational approach used to identify expressions made about topics within a span of text.28 With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. These contents have been recognized as measurable resources and various opinion mining methods have been developed to analyze the contents.29 The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object.30 Given an opinionated piece of text wherein it is assumed that the overall opinion in it is about one single issue or item, it is possible to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities.31 In case of document mining, the semantic orientation of the terms in a document of unknown sentiment is added up, and if the overall score is positive, the document is classified as being of positive sentiment, otherwise it is classified as negative.32 This concept of sentiment mining is adapted for mining sentiment of consumers, as represented by their comments under an organizational blog post to determine positive or negative sentiment polarity.

Methodology

  1. i)

    Research Instrument and Sampling Technique An evaluation grid was developed to help consumers link a predefined set of sentiment bearing words with the consumer objectives of an inclination to make a purchase or demonstrative of the state of the consumer's relationship with the organization. A focus group of 20 consumers were given the evaluation grid and asked to link the sentiment bearing words with the respective consumer.

  2. ii)

    Factor analysis A factor analysis was used to load the sentiment bearing words onto the factors of liking, satisfaction, Involvement I and II.

  3. iii)

    Hierarchical Cluster Analysis Sentiment scores of 18 consumers who posted comments on a post on the corporate blog of Southwest Airlines were subjected to Hierarchical Cluster Analysis and consumer clusters for the segments were extracted.

Evaluating consumer sentiment using Sentiwordnet 1.0

For evaluating consumer sentiment we use Sentiwordnet 1.033, a lexical resource in which each WORDNET synset is associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated with synsets and on the use of the resulting vectorial term representations for semi-supervised synset classification. The three scores are derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels, but different classification behavior.34 Considering comments as sets of opinionated text, with the assumption that the text (each set of comments on a single post) is related to a single issue or item, it may be interesting to see that the opinion would be either positive or negative or feature somewhere on the continuum between these two polarities. This can be done by converting each comment into a feature vector by using a text processing tool and then identifying the sentiment bearing features. By using a sentiment mining tool, where each opinionated word has been allocated a sentiment score on the basis of its wordnet synset, a sentiment score can be calculated for each individual comment. In this context, term occurrence has to be used as an indicator and not term frequency because in traditional sentiment classification increased term occurrence does not emphasize/change the sentiment polarity. Further, considering the algebraic sum of the term orientations as representative of the sentiment behind the comment, the score can be calculated. It is important here to correlate each term to the correct wordnet synset it belongs to, as that holds the key to the score. Volumes of consumers depicting positive and negative sentiment polarity are calculated. For instance, consider a consumer comment which reads,

Great idea. This looks like a smart, good deal.

Calculating the composite sentiment for this consumer comment comprises extracting the sentiment bearing words – great, smart and good. The sentiment scores extracted for these words from Sentiwordnet 1.0 are 0.625, 0.5 and 0.5, respectively. The composite sentiment score for this consumer is calculated as 0.541.

Feasibility of method

Subjective experiment –

To evaluate the feasibility of the method, a set of consumers were asked to link 17 sentiment-bearing words with the two constructs – Consumer's inclination to buy and Consumer's state of relationship with the organization.

A factor analysis helps load the respective sentiment bearing words into six different factors – Liking, Satisfaction, Involvement I and Involvement II. (Figure 1)

Figure 1
figure 1

 Linking sentiment bearing words.

A consumer passes through several stages viz. liking, satisfaction and involvement in his relationship with the organization. Liking can be defined as a state of fondness, affection or preference for product, brand or organization. This is a preliminary stage of consumer developing a tertiary interest in a product. A consumer moves to the next stage when he starts perceiving greater value in an organizational offering. The perceived value is now equated with perceived quality by customers and because of this customer satisfaction is enhanced. Consumers also tend to express their happiness and appreciation in the relationship with the organization and brand. This expression can be treated as representative of consumer satisfaction. IT tools can help in this regard. Consumer involvement is the perceived personal importance and/or interest attached to the acquisition, consumption and disposition of a good service or idea. As involvement increases, the consumer has greater motivation to comprehend and elaborate on information. Consumers who express doubt or worry regarding product, brand, features and so on are considered to be on the low end of the involvement scale. The next level represents consumers who have a complaint and seek grievance redressal.

Considering that the focus group of consumers could relate to the sentiment bearing words and further allocate people to groups based on their presumed state of relationship with the organization, quantification of the ‘sentiment’ and the subsequent use for consumer clustering appeared feasible.

Reliability of the test instrument was studied using the test–retest method, wherein the entire procedure was repeated on the same person twice. The Spearman–Brown coefficient was 0.957 and Guttman split-half coefficient was 0.955.

Consumer segmentation based on consumer sentiment score

By using corporate blogs for segmentation, marketing managers can segment customers (and prospective customers) into smaller groups and then specify the interaction that should take place with those individuals. Segmentation is the process of identifying groups of customers around whom to conduct marketing efforts by analyzing the existing customer base. It is a very important functionality of any tool as this allows the marketing manager to fine tune the deliverables of the campaign. While this helps in identifying most appropriate targets for specific campaigns by understanding a consumer's relationship with the organization, it also aids the process of consumer retention by identifying consumer groups that need special attention or redressal.

Traditionally, only a few broad segments could be defined, based on overall demographic information. With changing times, as volumes of data being collected internally has grown, it is possible to define many more segments at a finer level of granularity. In addition, it is now possible to define the segments based on their actual interaction with the company (rather than general demographic information) and to automate different responses to each segment. These consumer comments on the organizational blog posts conceal a wealth of information. While consumers with positive sentiment polarity can be subjected to consumer acquisition strategies, consumers with negative polarity represent a state of consumer dissatisfaction and can be subjected to strategies for consumer retention.

Using a corporate blog for consumer profiling

Consumer profiling involves creating consumer models based on which a marketer can decide on the right strategies and tactics to meet the needs of the customer. Profiling is an innate tool used for consumer behavior and preference prediction. Consumer profiles can be formed based on their purchase data or any other behavioral data. In the context of a corporate blog, the complete sample of consumers under a singular organizational post can be subjected to sentiment mining and their individual sentiment scores can be extracted. Each consumer comment can be considered as representative of the mindset of one single consumer and can deliver one specific consumer sentiment score. These sentiment scores can form the basis of consumer profiling. This can be done by subjecting the consumers to univariate Hierarchical Cluster Analysis on the basis of the sentiment scores. This will enable extraction of separate consumer clusters. Cluster Analysis, also called data segmentation, relates to grouping or segmenting a collection of objects (also called observations, individuals, cases, or data rows) into subsets or ‘clusters’, such that those within each cluster are more closely related to one another than objects assigned to different clusters. Hence, objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters. Clustering algorithms function such that intracluster similarity is the maximum and the inter-cluster similarity is minimum.

Classification of consumers under separate clusters can help organizations target them appropriately. For instance, if members of Cluster 1 depict consumers with negative sentiment, thereby indicating a state of dissatisfaction, the organizational targeting effort will need to be different from the targeting effort to be used for a cluster where members depict very high positive sentiment.

Collection of data from one sample campaign and subsequent cluster analysis is demonstrated

Sentiment scores for 18 consumers who posted comments on a post on the corporate blog of Southwest Airlines, were subjected to Hierarchical Cluster Analysis. Three clusters are identified (Table 1). The dendrogram is shown below (Figure 2). Aaron, Bill, Bob belong to cluster 1, while Drew belongs to cluster 3. All remaining members belong to cluster 2. The proximities matrix demonstrated the same. Classification of consumers under separate clusters based on their sentiment scores can help organizations target them appropriately. Members of Cluster 1 depict consumers with negative sentiment, thereby indicating a state of dissatisfaction, while members of cluster 3 show a low level of liking where the organization needs to work to build a greater rapport. Members of Cluster 3 depict a certain degree of satisfaction and can be treated separately.

Table 1 Cluster membership
Figure 2
figure 2

 Dendrogram using single linkage.

Further, as and when a new customer enters the system, the nearest neighbor approach can enable allocation of the consumer to the right cluster and ensure that he is targeted appropriately. The human ability to reason from experience depends on the ability to recognize appropriate examples from the past. Identification of similar cases from experience and application of knowledge of those cases to the problem at hand is the essence of memory-based reasoning. A distance function can be used to allocate every new entrant to the appropriate cluster.

Implications for marketing intelligence

  1. i)

    Considering a single organizational post on a corporate blog as representative of one organizational campaign (promotional or otherwise), the mean of the consumer sentiment scores for that organizational post will indicate the average sentiment of the entire population for that promotional campaign.

  2. ii)

    The degree of positivity or negativity of this mean score can help the marketing managers gain insight into the consumer thought process pertaining to a particular campaign.

  3. iii)

    Separate strategies can be developed for targeting of respective consumer segments.

  4. iv)

    Consumer behavior is a function of sentiment state indicating relationship level with the organization/brand or product.

  5. v)

    Consumer behavior of a new consumer entering the system (responding on a corporate blog) can be predicted by identifying his sentiment score and then allocating him to a respective cluster.

CONCLUSIONS

Segmentation of consumers is a significant approach to be followed for consumer acquisition. Similarly, customer attrition is an important issue for any company, especially important for mature industries where the initial period of exponential growth has been left behind. While on one hand, organizations can engage in maintaining good consumer relationships by better content management,35 they can further make use of the information available about their prospective and current customers by structuring and mining the vast volumes of data available on the web and formulate strategies for consumers by segregating them on the basis of some factors like the sentiment score represented in the discussion above. While consumers depicting a positive sentiment polarity can be grouped and subjected to specific targeting campaigns, consumers with negative sentiment polarity, who are most likely to defect from the company, can be subjected to a well-directed retention campaign. Equipped with the ability to group consumers under different segments, marketing managers can plan well-directed and -differentiated marketing strategies for a better return on investment.