Original Article

Information Visualization (2007) 6, 64–74. doi:10.1057/palgrave.ivs.9500141

A visualization testbed for analyzing the performance of computational linguistics algorithms

Stephen G Eick1, Justin Mauger2 and Alan Ratner3

  1. 1SSS Research, Inc.
  2. 2SAIC Advanced Systems & Concepts
  3. 3National Security Agency

Correspondence: Stephen G. Eick, SSS Research Inc. E-mail: eick@sss-research.com

This research was sponsored by the Air Force Research Laboratory, Air Force Materiel Command, USAF, under Contract number MDA972-03-9-0001. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL or the U.S. Government. This article is based on 'Visualizing the Performance of Computational Linguistics Algorithms', by Eick, Mauger, and Ratner which appeared in IEEE 2006 Visual Analytics Symposium.

Received 23 June 2006; Revised 31 July 2006; Accepted 23 October 2006; Published online 1 February 2007.

Top

Abstract

We have built an AJAX-enabled browser-based testbed for evaluating the performance of computational linguistics algorithms. Our testbed consists of a visualization system and analysis portal. Our focus is on algorithms that classify and cluster documents by assigning weights to words and scoring each document against high-dimensional reference concept vectors. The testbed visualization and algorithm analysis techniques include Confusion Matrices, ROC Curves, Document Visualizations showing word importance, and Interactive Reports. A unique aspect of our testbed is document visualizations built using Scalable Vector Graphics that show why documents are assigned to particular concepts and categories.

Keywords:

AJAX, thin-client, scalable vector graphics, ROC curves, confusion matrices, document categorization