For Authors_For Subscribers_For Librarians_For SocietiesFor Advertisers

Home | About Us | Contact Us | Site Map | FAQs

journal home
 
Services for Readers
Services for authors
Customer Services


Summer 2004, Volume 3, Number 2, Pages 80-95
Table of contents   Previous  Full text  Next   PDF
Original Article
Mapping nominal values to numbers for effective visualization
Geraldine E Rosario1, Elke A Rundensteiner1, David C Brown1, Matthew O Ward1 and Shiping Huang1

1Computer Science Department, Worcester Polytechnic Institute, Worcester, U.S.A.

Correspondence to: Matthew O Ward, Computer Science Department, Worcester Polytechnic Institute, Worcester, U.S.A. E-mail: matt@cs.wpi.edu

Abstract

Data sets with a large numbers of nominal variables, including some with large number of distinct values, are becoming increasingly common and need to be explored. Unfortunately, most existing visual exploration tools are designed to handle numeric variables only. When importing data sets with nominal values into such visualization tools, most solutions to date are rather simplistic. Often, techniques that map nominal values to numbers do not assign order or spacing among the values in a manner that conveys semantic relationships. Moreover, displays designed for nominal variables usually cannot handle high cardinality variables well. This paper addresses the problem of how to display nominal variables in general-purpose visual exploration tools designed for numeric variables. Specifically, we investigate (1) how to assign order and spacing among the nominal values, and (2) how to reduce the number of distinct values to display. We propose a new technique, called the Distance-Quantification-Classing (DQC) approach, to preprocess nominal variables before being imported into a visual exploration tool. In the Distance Step, we identify a set of independent dimensions that can be used to calculate the distance between nominal values. In the Quantification Step, we use the independent dimensions and the distance information to assign order and spacing among the nominal values. In the Classing Step, we use results from the previous steps to determine which values within the domain of a variable are similar to each other and thus can be grouped together. Each step in the DQC approach can be accomplished by a variety of techniques. We extended the XmdvTool package to incorporate this approach. We evaluated our approach on several data sets using a variety of measures.

Information Visualization (2004) 3, 80-95. doi:10.1057/palgrave.ivs.9500072

Keywords

Nominal data; visualization; dimension reduction; correspondence analysis; quantification; clustering; classing

Received 19 November 2003; revised 30 January 2004; accepted 13 March 2004
Table of contents   Previous  Full text  Next   PDF