Introduction
Most visualization research on understanding relationships in large data sets implicitly assumes that a node-link diagram is appropriate. In fact, nearly all the submissions to the InfoVis 2004 contest – that required solutions to the analysis of 10 years of InfoVis publications and their authors1 – used displays of node-link diagrams. We have designed many such visual displays ourselves and we believe that node-link diagrams are very useful in supporting topology-based tasks such as finding clusters or connected components, detecting patterns or outliers, determining the shortest path between the nodes, etc. However, node-link diagrams also have limitations in accomplishing attribute-based tasks in network data set. In addition, they do not scale up well and too often produce cluttered overviews with few readable labels. For example, let us consider a scientific publication network data set that is composed of several thousand papers (represented as nodes) that are linked by directed edges representing the citation relationship. If this network is paired with the author network, which consists of another several thousand researcher nodes that are linked by co-authorship, the node-link diagram would not easily support even a simple task, such as reviewing the papers referenced by a selected author.
In this paper, we describe a different approach with the design of NetLens 2 which was inspired by our submission to that contest (PaperLens),3 by using multiple simple coordinated views of ordered lists and histogram overviews. NetLens represents a general and scalable design that uses a 'Content-Actor' model of information. Examples of Content-Actor pairs of interest to the visual analytics community include scientific publications and authors, emails and people, legal cases and courts, intelligence reports and countries or groups, products and companies, etc. In all these examples, both the content and actors consist of networked data such as reports citing other reports, authors having co-authors, products replacing or integrating other products, etc. NetLens shows paired networks of content and actors in coordinated views. It supports complex queries that are traditionally difficult to specify by allowing users to pose a series of elementary queries and iteratively refine them with visual overviews and sorted lists. In this paper, we extended our previous work2 by conducting a usability study to verify the usefulness of our approach.
After reviewing the related work, this paper first describes the NetLens interface using a subset of the ACM Digital Library. We show how users can readily accomplish tasks such as determining which sub-fields are trendy and which are on the decline, finding appropriate experts to review a paper or serve as an expert witness in patent litigation, or even determining relationships between groups based on an analysis of their publications. We then describe NetLens in more general terms and show how the Content-Actor data model is defined with its related tasks. We finally review possible evaluation methods for this novel approach along with a usability study.
Related work
Network data visualization
Many interactive visualization systems have been developed to help users explore and analyze collections of networked documents or communities of people.4, 5 Several websites,6, 7 show a rich variety of network visualization examples. However, most network visualization efforts have focused on layout strategies that can be classified into node-link diagrams or matrix representations.8, 9
Visual interfaces to digital libraries show an alternative way of network visualization.10 For example, Butterfly11 combines search and browsing to visualize citation graphs. Envision,12 a digital library augmented with a flexible user interface, facilitates examination of very large data sets and helps users discover patterns in results of queries. GRIDL13 visualizes thousands of search results at once on a two-dimensional display that use hierarchical, categorical axes.
Multiple coordinated views
Multiple coordinated views often improve user performance, enable discovery of unforeseen relationships, and offer unification of the desktop.14, 15 For example, Spotfire16 displays a single table in different types of views, such as scatter plots and bar charts. All the views are coordinated by brushing17 to allow users to relate data points across views. Visage18 provides richer data manipulation operations such as drill-down, drag, paint and roll-up to enable users to compose a complex query, but all the views are coordinated based on a single centralized table so it can not represent the relationships across multiple tables. Snap-Together15 enables users to create different types of coordinations such as brushing, drill down, overview and detail view, and synchronized scrolling. Although some types of coordinations can represent joined tables, their relationships are limited only to one-to-one or one-to-many and thus are not appropriate for representing network relationships.
Iterative query refinement
Conventional search interfaces to online resources allow users to specify several query fields and initiate the query by pressing a button. While these systems are easy to manipulate, it is often difficult to formulate or express queries effectively. Furthermore, users often learn what they need to ask and how to do it through the query process. Thus, iterative query refinement has been used in many information retrieval systems. For example, SIS19 commits queries whenever users change any of the filtering widgets. VQuery20 enables users to interactively refine query results, using Venn diagrams to provide dynamic query previews. PESTO21 provides a query history mechanism enables users to reuse old queries.
NetLens interface
While NetLens supports any data set that can be represented with our abstract Content-Actor data model (see section NetLends data model), we first illustrate the interface using a specific data set and describe simple scenarios of use in this context. The data set consists of a subset of the ACM Digital Library and contains 4073 papers from the CHI conference from 1982 to 2004 authored by 6358 people. The NetLens display (Figure 1) is divided into two symmetric sections. On the left is the content (i.e. the papers) for this data set, and on the right are the actors (i.e. the authors) of the papers. Each section includes several panels: overviews at the top (showing histograms which can show distributions of items over all the available attributes), filters on the right side, and lists of items in the lower area. In each section, the different panels are tightly coupled15 so that any changes such as selection in one panel is immediately reflected in the other panels. All elements of the display are used to specify queries and to display results: for example, selecting a histogram bar filters the list of papers, and hiding certain papers from the list (e.g. the less cited ones) is reflected in the histogram.
Figure 1.
NetLens has two symmetric windows. The left is for Content (papers) and the right for Actors (authors). Each side is further divided into panels; overview at the top, filters on the right, and lists at the bottom. Here, the Content side has two lists to reflect papers and their citations or references, and the lists on the Actor side show authors and their co-authors, respectively. The paper overview panel shows the distribution of papers (in logarithmic scale) over time, grouped by topics. Users can see which topics have their number of papers increase or decrease over 22 years. On the right side, the overview of the authors shows the distribution of countries of origin in logarithmic scale.
Full figure and legend (501K)Initially the two sections for papers and authors are not linked and are explored individually. The red oval at the center of the display indicates that nothing will flow in between, and users can change this and set the direction of the flow by right clicking on the red oval. When flow is allowed the oval is replaced by a directional arrow (Figure 2c).
Figure 2.
Selected steps of the interaction of the scenario 'Learn about a group of researchers'. (A) In the overview of authors per country, Seven Asian countries were selected and the list filtered accordingly. (B) The overview is changed to show distribution by institution and 'Research Labs' and 'Inc.' are chosen. (C) The authors from Asian industry are sent to the paper side revealing the topic and time distribution of their papers. (D) Sorting by number of citations reveals influential papers. Double click opens a paper (or a an author page) at ACM. (E) Influential papers (cited more than once) were selected and sent to author side. The authors are sorted by # of influential papers. (F) All citations of the influential papers are saved in 'My list' and shown. Right side shows country distribution of their authors.
Full figure and legend (764K)Scenario: 'Learn about a group of researchers'
Let's imagine that John needs to learn more about a group of scientists from Asia that is knowledgeable in HCI. He thinks of several questions such as: 'What countries are they from?', 'How active are they?', 'What topics are they working on?', 'Who leads with many influential papers?', and so on. He asks to see an overview of the authors by continents of origin by changing the attribute mapped to the X-axis to 'Continent'. He clicks on the Asia bar and scans the list of names and affiliations. It looks like authors from Japan are dominating, so he sets the overview to show a distribution by country and sees that there are seven Asian countries. Japan has 200 authors, which is followed by Korea, India, Singapore, Taiwan, China, and Malaysia (Figure 2a). He unselects three countries that have less than five authors, which filters the list of authors to 242 authors. He sorts and visually scans the affiliations. He can see that authors from industry and academia are almost equally distributed, so he decides to study the researchers in industry more closely since he is more familiar with Asian companies rather than universities. He switches the overview again to show a distribution by institution, and selects only 'Research Labs' and 'Inc.' from the histograms and narrows down authors to 119 (Figure 2b).
Now he wonders what they worked on, so he sends the list of authors to the paper side by changing the red oval to a left arrow (Figure 2c). He sees that they have 88 papers. By scrolling the paper overview he sees many papers in CSCW and Multimodal User Interfaces. Papers are sorted by number of citations so the top of the list now shows the most influential papers. Double-clicking on a paper brings it up in the ACM Digital Library (Figure 2d). When a paper is selected, its citations are also highlighted and brought to the top of the list of citations. He can switch to references instead of citations.
Next, John wonders who wrote the most influential papers. He selects the 22 influential papers, which have been cited more than once, then changes the direction of the arrow. Only 25 authors are left (Figure 2e). He can order them by the number of influential papers. On the top is Jun Rekimoto from Sony Computer Science Labs, who was an author of eight influential papers.
Finally, John decides to look at who was influenced by the work of the researchers in Asian industry. He selects all the citations (66 citations from CHI community) of the 22 papers and saves those citations in 'My List' and asks to have that shown in the histograms. It updates the list of authors accordingly, showing the distribution of who cited the influential papers. John resets the author filters to see them all (instead of only the authors in Asian industry). He then switches to an overview by country, to see which countries have been following this work (Figure 2f).
Now that we have introduced the interface of NetLens with this scenario, we will describe the underlying data model and explain how and why we chose it in section 'NetLens data model'. In section 'A general and scalable interface design', we will come back to the some of design challenges we encountered and how we addressed them in designing the NetLens interface.
Netlens data model
Data analysis
Entity-Relationship models22 of several common network data domains such as email archives, personal photo collections, and digital libraries (Figure 3) show that network data sets are frequently composed of a small number of independent entities and their relationships. The relationships can be further divided into intra-relationship (relationship within the same entities) and inter-relationship (relationship between the different entities).
Figure 3.
Entity – relationship diagrams of three networks (e-mail collections, photo collections and scientific publications).
Full figure and legend (86K)It is interesting to note that Entity-Relationship models have two properties that are useful to define an abstract network data model. First, the structure of the data model can be manipulated to suit the tasks and tools. An entity's attributes can be separated from the entity to become a new independent entity. For example, one of the paper's attributes, 'Category', can be separated from the paper entity and converted to a new entity so that they can have a 'HAS' relationship. On the contrary, an independent entity can be merged into the related entity to represent an attribute. For example, the author entity could become an attribute of the paper entity. Second, many network tasks can be described by following a path (an alternating sequence of entities and relationships) in the model. Suppose that you want to find out how many citations of the papers a person has written are not self-citations. You would have to search for papers written by the author, find all citations of those papers, find the authors of those citations, remove the original author, and filter the original set of citations to see only the papers written by the remaining authors. Finally, counting the papers in the resulting set yields the answer. The path for this search is people (a researcher)
paper (written by the researcher)
paper (citations of that paper)
people (authors of the citations)
people (original author removed)
paper (written by those authors)
paper (intersect two lists of papers).
Based on these two properties of Entity-Relationship models, we designed an abstract data model as shown in Figure 4. The data model is composed of two symmetric entities, which we describe as Content and Actors, respectively, and their inter- and intra- relationships. We use the terms content and actors to help in understanding and simplifying network data. While we understand that there will be some kinds of network data that are hard to map to this simple model, we are confident that this basic model can be extended to represent more complex network data model as well as it can be used as is for many practical data sets.
Task analysis
Using the abstract Content-Actor data model as a basis, we examined the tasks that were collected from group interviews and discussions with our stakeholders as well as graph visualization user studies. It was interesting to note that most of the collected network analytic tasks were based on the attributes of the two entities and their intra- and inter- relationship while there were a few topology-based tasks that were more suitable for the node-link diagram approach.
The network tasks we identified can be classified into two groups based on our abstract data model: single-step and multi-step tasks. Single-step tasks can be defined as a query regarding the entity itself and its attributes, or one step intra- or inter- relationships. The following are examples of single-step tasks for a digital library data set with paper and author entities:
- How many papers on 'User Study' were published in 1998?
- Who are the authors of the papers on 'Virtual Reality', which were published at the CHI 99 conference?
- Which paper is most frequently cited by the papers published at the CHI 2004 conference?
- Which author is most frequently cited in the 'InfoVis' topic?
- How many papers were published by UMD HCIL people?
- Who are the authors whose nationality is Korea?
Table 1 shows the classification of single-step tasks, which is based on our abstract Content-Actor data model. It shows that each entity can be used either for a search query or for a search result. In other words, papers can be searched either by papers or by people, just as authors can be searched by papers or by people. Owing to the symmetric structure of the entities in our data model, the search results of a single-step task can be recursively used as an input to other single-step tasks, which enables users to achieve more complex multi-step tasks by iterating over the table.
Table 1 - Sample single-step tasks (on digital libraries) classified by the combination of two entities and the way the entities are used (either search-by entities or result entities).
Single-step tasks can be linked together to form multi-step solutions to more complicated and high-level network tasks. The following examples show some interesting high-level tasks that people might ask about digital libraries.
- Evaluate individuals:
- how many papers were self-referenced?
- how frequently was each paper referenced by other papers?
- Identify communities:
- what are the major paper topics published by UMD HCIL and who in this group has the most papers in that topic?
- how do UMD HCIL's research interests change over time and who in this group made that change?
- Find experts (to review papers or attend a workshop):
- who wrote the most papers in the InfoVis topic and how many papers cited that author's papers?
- whose paper in the InfoVis area is most frequently referenced by other papers?
- Learning about a new topic (to find a good PhD topic):
- which topic has growing publications and who contributed most to this topic in the last 3 years?
- what are the other topics that the authors in the InfoVis area also are interested in?
- Where should I go on a sabbatical?
- which country (or research group)'s authors most frequently reference my papers?
The above example tasks can be answered by iterating through Table 1 multiple times.
A general and scalable interface design
NetLens uses a general interface design that can be applied to any data set represented with our abstract Content-Actors data model. This means that there must be just two entity types, each with attributes and relationships within and between those entities. For example, in conference publications, the two entity types are papers and people. Papers have attributes such as titles, abstracts and keywords. They cite and are cited by other papers and are authored by people. Those people have attributes such as institutions and fields of interest, and can be connected to papers through authorship relationships and possibly to other people through advising or committee memberships. This basic structure applies equally to collections of photos (with the primary entities being photos and objects within those photos), email collections and even legal cases.
An example of intra-relationship is reflected in the two lower panels of the Papers side, with the list of papers on the left and their references or citations on the right (Figure 1). NetLens uses whatever metadata (i.e., attributes) is available, creates filters and displays the results. The quality of the interface depends heavily on the richness and value of the metadata.
NetLens is also scalable because it uses a standard relational database and the interface is built of common simple components such as histograms and lists. Together, they offer surprisingly rich support for real complex tasks. By avoiding visual overviews of the entire data set that display a visual element for each entity instance, we avoid immediate problems of scalability.
Interface challenges and proposed solutions
In this section, we discuss some of the design challenges that were uncovered from our early usability testing along with the proposed solutions.
Dataflow control
One of the most important issues in designing the NetLens interface was how to control dataflow between and within the entity windows (between-entity and within-entity dataflow hereafter). The initial between-entity dataflow model was simple. We designed the entity windows to be tightly coupled so that any data changes in one entity window were automatically transferred to the other. However, the tightly coupled entity windows often resulted in a cyclic dataflow, and moreover, the automatic between-entity dataflow often caused unintended and unnecessary query refinement as well as performance problems. Therefore, the between-entity dataflow model was revised to let users manually select the direction of dataflow (if any) by using an additional interface widget (red arrow) as shown in Figure 2c.
On the other hand, all the sub-panels in the entity windows are tightly coupled. For example, if some papers are filtered out from the overview panel in Figure 5, then the list of papers in the detail-on-demand panel will be updated (1
2), and the list of references in the intra-relationship panel will also be updated accordingly (2
3). However, NetLens stops the within-entity dataflow and does not allow it to be propagated to the overview panel (3
1) in order to avoid cycles. Although the direct dataflow between 1 and 3 is not allowed in NetLens, users can make use of the 'My List' feature for indirect dataflow between 1 and 3. Users can collect any entity instances and save them to 'My List' while exploring data, and use this collection as a filter in the overview panel.
Figure 5.
The within-entity dataflow among three sub-panels. (Note that red numbered circles and arrows are not part of the actual interface.)
Full figure and legend (489K)History and integrated help
As users generally perform a sequence of interactions to accomplish a task, they sometimes get lost during the data exploration and ask themselves questions, such as, 'How did I get here?' or 'What does the current filtered data set mean?' To avoid such problems, we added two support mechanisms in NetLens: query history management and integrated guidance.
The query history management system records every interaction users make in NetLens and automatically annotates each interaction to help users remember the interactions they have made before. By using the list of annotated interactions (Figure 6), users can go back and forth to confirm or correct their previous interactions. The annotations can be manually edited later if any interaction needs to be described more precisely. The history of interactions can be saved to a file and retrieved later for reuse.
Figure 6.
History management system shows the list of steps users have performed.
Full figure and legend (150K)The integrated guidance system helps users understand the meaning of the current data set in each panel. Each panel has a '?' button located at the top right corner. Clicking that button makes a yellow sticky-note-like window appears over the window to show what queries have been applied so far to the data set in the panel.
Multi-layered interface
Although the NetLens interface is composed of two entity windows and six sub-panels (three panels per entity) to represent the whole abstract Content-Actor data model, users do not always need all the panels and the multiple components may make the interface difficult to learn. To give users more control over the interface, we designed a multi-layered interface23, 24 which enables the user to specify which panels are visible. We chose three levels: (a) overview only (b) overview + detail-on-demand list, and (c) overview + detail-on-demand list + intra-relationship list (e.g. references/citations). Users can also see a single entity to start.
Data export and integration
NetLens provides a rapid way to sift thru a lot of data and identify areas of interest. It can transfer any filtered or selected set of data to other applications through the windows clipboard (using copy and paste), internal graph class object, or even xml documents. To illustrate this capability, we connected a graph browsing tool we developed called TreePlus25 to allow users to study the co-authorship and the citation networks in more detail (Figure 7).
Figure 7.
TreePlus shows the network of co-authorship network between Ben Bederson and Japanese authors who cite him.
Full figure and legend (119K)System architecture
NetLens was written in C# using the University of Maryland's open source Piccolo toolkit26 for histogram visualization. Piccolo is a layer built on top of low level graphics API, which facilitated the implementation of 2D structured histogram component by providing many useful features such as screen repainting, bounds management, event handling and dispatch, picking, layout, etc.
NetLens uses an MS Access database, but it can be connected to any relational databases through ODBC connection as long as they fit the basic data schema that is composed of two tables representing two individual entities, one table for their inter-relationship, and finally two optional tables to represent the intra-relationship for each entity (Figure 8). One of the major benefits of using a relational database system in implementing NetLens was that it is relatively easy to construct a real data schema from our abstract Content-Actor data model. Good performance and relatively small data storage size are two additional benefits we can get from the current relational database technologies.
Figure 8.
NetLens Data Schema: Paper and Author tables represent two individual entities, Authorship table shows inter-relationship between them, and Reference and Coauthorship tables show intra-relationship for each entity.
Full figure and legend (96K)The NetLens architecture is basically composed of two parts; NetLens Model and NetLens Viewer. NetLens Model manages all the data related processes so that NetLens Viewer can just request the necessary data from the Model and visualize them without caring about the DB systems that the Model uses. The NetLens Model can be easily replaced by other modules to support various types of DBs such as XML DB, OODB, or just plain text files.
Evaluation
Evaluating complex interfaces is a challenge, especially in the field of exploratory search27 and Information Visualization.28 In the case of NetLens three approaches stand out as good possibilities.
- Usability: evaluation could measure how usable the interface is, using classic usability measures such as speed of performance or error rates on simple imposed representative tasks.
- Generality: the evaluation can focus on the generality claim of the system by evaluating how easy it is to create new applications for new domains and data models.
- Power: evaluating NetLens could consist of evaluating the range and complexity of queries that can be answered, and characterizing the queries that cannot be answered, then comparing the complexity of the user task compared to more traditional queries such as SQL queries, either theoretically or empirically.
We have not focused extensively on evaluation yet as we are finishing our first mature prototype but some progress has been made on the first two approaches.
Usability
We gathered feedback about usability in three different ways: comments were collected from about 30 users during individual demonstrations, an independent heuristic evaluation was conducted at NIST with five reviewers, and we performed a usability evaluation with nine participants.
Comments during presentations
The ACM Digital Library turns out to be a very good application domain to investigate because we found ourselves surrounded by potential users who are familiar with the data and its limitations, have queries of their own, and often are mentioned in the data itself, titillating their interest and increasing their motivation to participate and provide feedback. Overall, we have collected feedback from dozens of users who saw or interacted with our interface in the lab or at large government sponsored project meetings (intelligence analysts, researchers from industry and academia, software developers). They made us aware of many low level usability problems, such as readability of the histogram labels, color choices, or needed improvements to the table displays. On the other hand, NetLens was generally well received. Users seem to show a sense of relief from the traditional node-link diagram displays, and appreciated the simplicity of the tables and charts. They agreed that the complex queries executed by NetLens would be very difficult if not impossible for them to specify with SQL.
Independent heuristic evaluation
An independent heuristic evaluation was conducted in the spring of 2006 at the National Institute of Standard and Technology (NIST) in the context of the NIMD program. We provided the software, installation procedure, and video training materials. The developers were not present. Five reviewers used the video training and work individually on six tasks using NetLens. They used both general HCI heuristics and visualization specific heuristics to categorize problems and issues. A report that collates the five individual reports as well as the individual reviews and video recordings of all the interactions suggests that the overall interaction metaphor seemed understandable and that users could get started using NetLens on their own. Low-level usability problems were identified. Some of the tasks could not be accomplished because users did not always use a successful strategy. For example, one of the tasks asked if any people in a small group of authors had a temporary drop in their publication. A simple strategy was to set the overview to show a timeline of publications and then look at the publications of individual authors one at a time, but one participant kept looking in vain for a specific function in the interface that would accomplish this task. Many of the comments provided had to do with training. Our training videos were rough (e.g. they did not include rewind or fast forward controls) and did not include all details of the interface (e.g. shortcuts or undo). Reviewers also requested more examples of strategies to be provided in the videos.
Qualitative usability study
A qualitative usability study was conducted during the summer of 2006 using with the ACM-CHI data set. Nine participants (seven male participants and two female participants, including 1 pilot participant) were recruited from Microsoft Research. The tasks were slightly modified after the pilot, so the pilot participant's data is only included in the discussion of the usability problems encountered.
Procedure: Participants watched the set of training videos, which took about 20 min. They were also given additional explanations on the features that were not shown in the video. For example, some important items, such as arrows and the yellow help window were not visible in the videos. We also gave an explanation about the color schemes. Participants were given the opportunity to freely explore the NetLens interface to get familiar with it for about 10 min. They were encouraged to ask questions after each video and during the practice period.
Next, the participants were asked to conduct 10 tasks (described below). Completion times and error rates were recorded. Participants were requested to answer all questions using only the information in our database (i.e. they could not access the Internet). They were encouraged to think aloud to tell us what they were trying to do, and what did or did not work well. When participants had obvious problems completing a task, the experimenter gave them hints. In addition to these hints, the experimenter recorded the participant's comments and whatever usability issues were observed. At the end, participants filled out a satisfaction questionnaire. Each session lasted about an hour. There was no monetary compensation for participants, but refreshments were provided.
Tasks: The tasks were chosen not only to identify low-level usability problems, but also to evaluate users' awareness of the information flow. The first eight tasks were sorted by expected difficulty based on the required interactions and the number of times that the flow had to be switched between two entity views. For example, task 1 did not require users to send information to the other entity view. Tasks 2, 3, and 5 required users to send information to the other view once. Tasks 4 and 6 built on their respective previous tasks but did not require new information flow. Tasks 7 and 8 required setting the information flow twice. Furthermore, task 8 required users to use the 'My List' feature. The last two tasks were designed to see how users would use the help and history features to understand the information flow. The actual tasks used follow, except names of people and regions have been removed:
- Which [region] has the most authors? (Please ignore the 'Unknown' data category.)
- Which author in the 'CSCW' area has written the most papers?
- How many papers did [name] publish at CHI?
- How many of [name]'s papers were in the 'InfoVis' area?
- What topics have [region] authors published papers about?
- During what years were there no publications from [region]?
- From which universities did [name]'s co-authors come?
- What research labs referenced [name]'s most often referenced paper?
- What user actions were taken to get the current paper list? (Please use the help system and describe them as specifically as possible.)
- What user actions were taken to get the current people list? (Please use the help and history systems and describe them as specifically as possible.)
Time and error results: Participants were able to complete most tasks within about 2 min, as can be seen from Figure 9. While they completed 86% of all tasks without hints, only 73% of tasks were answered correctly. No one gave a correct answer for task 8 since they did not retrieve the whole citation list from the database (see Figure 10 and discussion below). Only one participant correctly answered task 3 (see below). For task 7, one participant forgot to remove Robert Woodruff and two forgot to reset the author filters after sending the papers back to the author view (therefore, seeing only Allison Woodruff and not her co-authors). Three participants were not able to answer task 10. One participant answered only using the help button in the author list. One participant selected one step in the middle from the history system and got lost. The last one described the authors related to the selected papers instead of the selected authors.
Figure 9.
Average task completion times using NetLens with the ACM-CHI data set. Bars represent standard deviation.
Full figure and legend (52K)Usability issues: Several usability issues were observed. We categorize them into (1) low level general usability issues and (2) information flow and related problems. Issues are also prioritized based on how many of the participants encountered or mentioned them and whether they prohibited participants from finding an answer or how long the issue delayed finding an answer to a task.
Among the low level usability problems, the biggest issue was with the 'Get Whole List'. As the system currently returns only the top 100 items for performance reasons, users need to ask for the whole list before they send it to My List if there are more than 100 items. For task 8, participants are required to get the whole list of citations for the Cone Trees paper. No one used the button but just used Ctrl+A to select all the items from the list. This was a significant problem that would be easy to fix by, for example, using the virtual ListView. Net control – which should allows users to browse the entire list without experiencing the performance decline in populating the ListView with a large number of items. Two other issues were related to labels. First, labels for the histogram bars are too small to read. Several users did not even realize that labels were presented at the bottom of each bar. Second, the meaning of some labels was confusing. For example, three users mentioned that it was hard to interpret 'frequency' from the author list. Participants gave us feedback concerning the arrow representing the information flow (Figure 2). While participants acknowledged that the arrow was helpful to show the direction of the information flow, they thought it was poorly designed. Four participants wanted to invoke a menu on left click instead of right click. Three users thought that the size and color of the arrow was inappropriate.
Several other low-level usability problems were uncovered as well. Three participants wanted to select a topic by clicking its label from the histogram view. Some issues were related to interactions with the lists. Three participants commented that having too many bars with zero items made the screen look busy but not useful. Three participants wanted to toggle the selections for the list items. Two participants also wanted a way to unhide the hidden items without using the 'Back' button. Two participants asked us to put counts anywhere we can count and show the total as well. Users also observed inconsistencies with the windows standard. For example, the history order is inverted. And, while multiple histogram bars are selectable, using Shitft+Click is not supported. It is not conventional to gray out check box labels to deemphasize. One participant mentioned that '?' button does not conform to Window's standards for help.
The more challenging usability issues were related to information flow. They included problems with how to interpret the counts provided, and the visibility of the filters. As in every usability test, design mistakes became apparent. For example, the number of publications of an author could theoretically be presented in two ways: (1) as an attribute of the author; or (2) as the number of paper entities written by the author. While NetLens uses the second way, participants were clearly misled into believing that the first way was supported as well. For example, in task 3, after participants found the target author (i.e., Bederson, which had three variant entries), they needed to send authors to the paper side to see the list of papers published and read the count on top of the list, but eight out of nine participants never initiated the information flow from the author to the paper side. They simply summed up the frequencies in the author list with the apparent belief that those frequencies represented the total number of papers (they had to add the numbers because our data set has three spelling variant entries for Bederson). This was a severe problem but explicable by the fact that we mistakenly displayed a '1' (instead of a 'blank') in the frequency column when no flow was initiated (i.e. no papers had been sent to the author side yet.) On the other hand, for tasks 4 and 5, all the participants correctly send the selected authors to the paper side. We believe participants initiated the data flow because they did not think of the number of InfoVis papers as an attribute of an author (task 4), or the number of publications as an attribute of a country (task 5). We observed a similar issue with task 7. Two participants were stuck in the paper list. We had to give them a hint that they needed to send papers back to the author view to get author information. In fact, four participants wanted to find co-authors from the paper list and mentioned that NetLens requires too many steps to get co-authors.
In other words, NetLens needs to provide more visual support for understanding derived relationships. Better training will also help. One participant pointed out that there was no easy way to separate the authors for each paper from a set of papers in the paper list. Tooltips were suggested to list authors of a paper, but dynamic highlights between entities might better fit the current design (similarly to the dynamic highlights of intra relationships between papers and their references.) A tighter integration of NetLens with TreePlus would also allow users to chose the representation they feel is more natural. Four participants needed a hint to remember to use 'My List' to answer task 8. Three participants wanted to send papers from the citations list – where their current focus was – directly to the author view. They suggested that a corresponding menu item be added to popup menu provided from the list.
Another aspect which should be improved is making visible what filters users have set, and better differentiating the filters activated by users from the ones performed by the system. For example, three participants thought that Reset Paper/Author Filters would reset all the filters including the ones applied by the information flow instead of just the previous author filtering they had selected. On three occasions a participant forgot to reset a filter when he needed to. Two participants mentioned that when they change the direction of the information flow, it is not obvious whether filters are added or overwritten.
Three participants commented that the direction of citation is confusing since there is no visual connection between the two lists. One participant mentioned that the blue arrow to show the information flow between two lists made him think that the paper list cites the citations list.
User satisfaction: The average satisfaction ratings are shown in Table 2. All ratings were made using a 9 point Likert scale, with 1=Disagree and 9=Agree. Overall, participants thought the system was not very easy to use but still felt fairly comfortable and liked using it. While participants indicated that labels were difficult to read, they rated that the highlighting and arrows were helpful and colors were appropriate. They offered positive ratings on the help and history systems and felt it was easy to correct mistakes.
Table 2 - Average Likert scale ratings for NetLens, using the scale of 1=Disagree, 9=Agree.
As is always the case, this usability study was very useful in uncovering problems that had not transpired from collecting feedback from personal demonstrations. While we observed many usability problems, NetLens received very positive feedback, such as 'It is very nice,' 'It is awesome,' 'I really like it,' and 'Good work!' One participant got very excited and offered several real life usage scenarios. Many participants commented that NetLens is not very difficult to use if one considers the intrinsic complexity of the data and possible queries. Overall, we found the results of the test sobering but encouraging as there are many ways to address the problems encountered by users, using better training to explain the inter and intra relationship flow between windows, better graphic design to show the status of flows and filters and their impact on the data presented, and better labeling, particularly of the different counts available.
We interpret this study to show that the core design of NetLens does in fact work, and once the usability problems we found are resolved, it has the potential for offering a strong solution for these kinds of tasks.
Generality
The generality claim of the system should evaluate how easy it is to create new applications for new domains and data models. To explore the generality of the NetLens design, we connected NetLens to two other data sets. Our first attempt was to explore the Enron e-mail collection which contains about half million e-mails and 87,000 people who sent and received those e-mails.29 In about 3 days the first author and main developer of NetLens was able to design the new data model, produce data converters and customize NetLens to create the prototype. An estimated 200 lines of code was needed to modify the software. Most of the effort was focused on designing the data model and manipulating data. Our prototype shows emails on the left, and people on the right. An example iteration of queries is: 'Find the emails with a certain characteristics (e.g. rated as confidential)', then 'Find who received those emails', 'Are there people who received significantly more emails like this?', 'For those people, what were the characteristics of the emails they received (e.g. topic distribution)?', 'Read the emails that concerned pricing issues'.
A second application was developed to explore legal precedents. The database of legal cases collected by a team of researchers from the Dept of Government and Politics at the University of Maryland contains 2780 federal judicial decisions from the period 1978 to 2005 concerning the legal issue known as 'regulatory takings.' Within a day, we designed the new data model, produced data converters and customized NetLens. An estimated less than a hundred of lines of code were needed. Again, most of the effort was focused on designing the data model and processing the data. The prototype shows on the left the cases, and on the right the courts that managed them.
Based on two earlier applications, the NetLens source code was revised and restructured to minimize users' efforts to apply their data sets to NetLens by letting users change just a few lines of constant values without knowledge of the database or SQL queries. The generality of our design in this version has been tested and verified using personal photo library data. The photo library data was successfully applied to NetLens within a few minutes (without counting data converting time) just by changing the constant values. Obviously the real generality will be better evaluated when other people create their own applications. This requires documentation and possibly tools to facilitate the customization of NetLens.
Further evaluation
We hope to further evaluate NetLens by reviewing the range and complexity of queries that can be answered, in particular, characterizing the queries that cannot be answered, then comparing the complexity of the user task compared to other systems (e.g. SQL queries) either theoretically or empirically.
Conclusion
Our analysis of Entity-Relationship models of common network data sets enables us to describe an abstract Content-Actor data model that we used in the design of NetLens. The NetLens interface allows users to explore network data incrementally and refine their queries iteratively so that they can accomplish complex network tasks that are usually composed of a sequence of low-level tasks.
NetLens is general in that it applies to any data set that can be represented as a Content-Actor data model. Also, the NetLens interface design is scalable because the interface is built from common simple components such as histograms and lists so that it can avoid visual overviews of the entire data set that display a visual element for each entity instance.
This is just a beginning, but the utility of NetLens seems promising. Generalizing and simplifying common network data sets to the abstract Content-Actor data model and connecting them to NetLens might facilitate completion of required tasks more rapidly and reliably while offering rich support for real analytic tasks. In addition, NetLens can resolve some classic problems of node-link diagrams such as lack of support for attribute-based network analytic tasks, difficulty in scaling-up in terms of visualization, and decline of readability in proportion to the network complexity.
NetLens does have limitations in supporting topology-based analytic tasks or link analysis tasks such as the reachability of a node, the number of paths between the nodes, or the shortest path between the nodes. We understand that these tasks are also important and difficult network analytic problems for which node-link diagrams may be better. However, we believe that there are significant benefits in using a consistent and simple user interface to explore a variety of Content-Actor network data sets.
Finally, NetLens demonstrates powerful alternative and complementary techniques to the classic node-link diagrams for exploring network data.
References
- Information Visualization 2004 contest: The History of InfoVis, www.cs.umd.edu/hcil/iv04contest.
- Kang H, Plaisant C, Lee B, Bederson BB. NetLens: Iterative Exploration of Content-Actor Network Data. To appear in Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST'06), 2006.
- Lee B, Czerwinski M, Robertson GG, Bederson BB. Understanding Research Trends in Conferences Using PaperLens. In: Extended Abstracts of the SIGCHI Conference on Human Factors in Computing Systems. ACM SIGCHI Conference, ACM Press: New York, 2005; pp. 1969–1972.
- Freeman L. Visualizing Social networks. J Soc Struct 2000; 1.
- Herman I, Melançon G, Marshall MS. Graph Visualization and Navigation in Information Visualization: a Survey. IEEE Transactions on Visualization and Computer Graphics 2000; 6: 24–43. | Article |
- An Atlas of Cyberspace: www.cybergeography.org/atlas.
- Visual Complexity: www.visualcomplexity.com.
- Di Battista G, Eades P, Tamassia R, Tollis IG. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall 1999.
- Ghoniem M, Fekete JD, Castagliola P. A Comparison of the Readability of Graphs using Node-Link and Matrix-Based Representations. In: Proceeding of the IEEE Symposium on Information Visualization (InfoVis'04), 2004; pp. 17–24.
- Börner K, Chen C. Visual interfaces to digital libraries: motivation, utilization, and socio-technical challenges. In: Visual Interfaces to Digital Libraries (JCDL 2002 Workshop). Springer-Verlag: Berlin, 2002; pp. 1–12.
- Mackinlay JD, Rao R, Card SK. An Organic User Interface for Searching Citation Links. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'95), 1995; pp. 67–73.
- Nowell LT, France RK, Hix D. Exploring Search Results with Envision. In: Extend Abstracts of the SIGCHI Conference on Human factors in Computing Systems (CHI'97). ACM SIGCHI Conference, ACM Press: New York, 1997; pp. 14–15.
- Shneiderman B, Feldman D, Rose A, Grau XF. Visualizing Digital Library Search Results with Categorical and Hierarchical Axes. In: Proceedings of the fifth ACM Conference on Digital Libraries. ACM Press: New York, 2000; pp. 57–66.
- Baldonado MQW, Woodruff A, Kuchinsky A. Guidelines for using multiple views in information visualization. In: Proceedings of the working conference on Advanced Visual Interfaces Conference (AVI '00) 2000. ACM Press: New York, 2000; pp. 110–119.
- North C, Shneiderman B. Component-Based, User-Constructed, Multiple View Visualization. In: Extended Abstracts of the SIGCHI Conference on Human Factors in Computing Systems (CHI'01). Springer: Berlin, 2001; pp. 201–202.
- Spotfire: www.spotfire.com.
- Becker R, Cleveland W. Brushing Scatter Plots. Technometrics 1987; 29: 127–142. | Article |
- Roth SF, Lucas P, Senn JA, Gomberg CC, Burks MB, Stroffolino PJ, Kolojejchick JA, Dunmire C. Visage: a user interface environment for exploring information. In: Proceedings of the IEEE Symposium on Information Visualization (InfoVis'96), 1996; pp. 3–12.
- Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC. Stuff I've Seen: A System for Personal Information Retrieval and Re-use. In: Proceeding of the 26th ACM SIGIR Conference, 2003; pp. 72–79.
- Jones S. Graphical Query Specification and Dynamic Result Previews for a Digital Library. In: Proceedings of the Symposium on User Interface. Software and Technology (UIST '98). ACM Press: New York, 1998; pp. 143–151.
- Carey J, Haas L, Maganty V, Williams J. PESTO: an Integrated Query/Browser for Object Databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB '96). ACM SIGMOD, Morgan Kaufmann: San Franciso, 1996; pp. 203–214.
- Silberschatz A, Korth HF, Sudarshan S. Database System Concepts. 3rd ed. McGraw Hill: New York 1999.
- Kang H, Plaisant C, Shneiderman B. New Approaches to Help Users Get Started with Visual Interfaces: Multi-Layered Interfaces and Integrated Initial Guidance. In: Proceeding of the Digital Government Research Conference, 2003; pp. 141–146.
- Shneiderman B. Promoting Universal Usability with Multi-Layer Interface Design, ACM Conference on Universal Usability, 2003, 1–8.
- Lee B, Parr CS, Plaisant C, Bederson BB, Veksler VD, Gray WD, Kotfila C. TreePlus: Interactive Exploration of Networks with Enhanced Tree Layouts, In IEEE TVCG, Special Issue on Visual Analytics 12: 1414–1426.
- Bederson BB, Grosjean J, Meyer J. Toolkit Design for Interactive Structured Graphics. IEEE Trans Software Eng 2004; 30(8): 535–546 See also: Piccolo.Net, www.cs.umd.edu/hcil/piccolo. | Article |
- White R, Kules B, Drucker S, Schraefel MC. Supporting Exploratory Search, Special Issues of the Communications of the ACM, April 2006; 49: 36–39.
- Plaisant C. The Challenge of Information Visualization Evaluation. In: Proceedings of the working conference on Advanced Visual Interfaces (AVI 2004). ACM Press: New York, 2004; pp. 109–116.
- The public Enron email data set: www.cs.cmu.edu/~enron.
Acknowledgements
We appreciate the efforts of Mary Czerwinski and George Robertson from Microsoft Research for their work on PaperLens, the motivating work for this project, and their topic analysis of CHI papers that we used in the current prototype. We thank Doug Oard for suggesting the Content-Actor terminology and we also thank the participants of user study for their efforts and valuable comments. This work was supported in part by Booz-Allen Hamilton.
MORE ARTICLES LIKE THIS
These links to content published by Palgrave Macmillan are automatically generated.
RESEARCH
NetLens: iterative exploration of content-actor network dataInformation Visualization Original Article
CiteWiz: a tool for the visualization of scientific citation networksInformation Visualization Original Article
DataMeadow: a visual canvas for analysis of large-scale multivariate dataInformation Visualization Article


