AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

Clustering visualizations of multidimensional data.

Journal of Computational & Graphical Statistics

| December 01, 2004 | Hurley, Catherine B. | COPYRIGHT 2004 American Statistical Association. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

Many graphical methods for displaying multivariate data consist of arrangements of multiple displays of one or two variables: scatterplot matrices and parallel coordinates plots are two such methods. In principle these methods generalize to arbitrary numbers of variables but become difficult to interpret for even moderate numbers of variables. This article demonstrates that the impact of high dimensions is much less severe when the component displays are clustered together according to some index of merit. Effectively, this clustering reduces the dimensionality and makes interpretation easier. For scatterplot matrices and parallel coordinates plots clustering of component displays is achieved by finding suitable permutations of the variables. I discuss algorithms based on cluster analysis for finding permutations, and present examples using various indices of merit.

Key Words: Parallel coordinates: Permutation of variables; Projection pursuit; Scatterplot matrices.

1. INTRODUCTION

Datasets of three or more dimensions are notoriously difficult to display on a two-dimensional screen or on a piece of paper. Many graphical methods for displaying multivariate data consist of arrangements of multiple displays of one or two variables--for example, a scatterplot matrix consists of all pairwise scatterplots of two variables arranged in a square matrix, and a parallel coordinates display is a sequence of one-dimensional dotplots where line segments are drawn to connect the dots pertaining to a particular case. While in principle these methods generalize to arbitrary numbers of variables, in practice as the dimensions increase, they become less effective, presenting us with an overwhelming amount of information that is difficult to absorb. Usually, the ordering of the variables in these displays is arbitrary and corresponds to the order in which the variables were listed in the data file. However, the interpretability and effectiveness of visualizations often improve dramatically when the variables are reordered in some systematic way.

A scatterplot matrix shows all pairwise scatterplots of p variables, while a parallel coordinate display shows p - 1 of the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] pairwise line plots. Some of these pairwise plots are more interesting or informative than others, and an effective visualization should help us to focus on these. Our basic idea is that each pairwise display (a panel) is awarded a merit score measuring its "interestingness." Then the variables are reordered so that the viewer's attention will be focused on the most interesting panels, which are placed in prominent positions. For the scatterplot matrix, we consider positions close to the diagonal to be the most prominent, while for the parallel coordinate display interesting panels should be among the p - 1 visible panels. Suitable merit measures will depend on the context of the data and the type of display, but correlation is often a good starting point. Then the visualizations will help us identify clusters of similar (highly correlated) variables, effectively reducing the dimensionality of the visualization problem.

Ideally, the panel merit scores are combined into an overall merit score for the entire display. We could then find the permutation of the variables maximizing this overall score. A brute-force approach to solving this problem evaluates the criterion on all possible permutations of the variables, but this is slow except for small numbers of variables. Because our goal is effective data visualization, it is probably better to find a good display quickly rather than wait around for a slightly better but optimal display. Therefore, we use a fast ad-hoc algorithm based on cluster analysis (Gruvaeus and Wainer 1972) to come up with suitable permutations of the variables. In our experience the resulting visualizations are often far more effective than those using standard variable order.

The problem of choosing an ordering of variables for displays of multivariate data has received surprisingly little attention in the literature. The work of Bertin is an exception in this regard; ordering variables, cases, and categories in so-called "matrix displays" is a major theme of his work (Bertin 1983).

Related articles from newspapers, magazines, journals, and more
High Road to Process Control: Multivariate Methods.
Magazine article from: Semiconductor International McCafferty, Robert H. July 1, 2001 700+ words
...CMOS data in parallel coordinates (a multi...however many variables are rationally...performance lots in variable X15 positioned...Hence, those variables also were mistargeted...plotting data on parallel coordinates from an ordinary...among other variables), we ...
Enhancing supply decisions through the use of efficient marginal costs...
Magazine article from: Journal of Supply Chain Management Talluri, Srinivas September 22, 2002 700+ words
...a combination of data envelopment analysis (DEA) and parallel coordinates representation methods are jointly used to evaluate the...and concluded that the relative importance assigned to a variable primarily depended on the type of risk involved in a specific...
Neolinear Announces Powerful New NeoCircuit Functionality.
Press release article from: Business Wire July 16, 2003 700+ words
...without a starting point, NeoCircuit now includes multi-variable, multi-goal local optimization algorithms that can be...graphical techniques including correlation matrices and parallel coordinates. -- Rapid Analog Design (RAD) - NeoCircuit is now available...
Variable
Encyclopedia entry from: The Gale Encyclopedia of Science January 1, 2008 700+ words
...x, y, and z for variables. A variable is often denoted...relationships between two variables. In these functions, the value of one variable is said to depend...between two or more variables. Independent variable — A variable...
Variable annuities: the stepchild comes of age.(Statistical Data Included)
Magazine article from: ABA Banking Journal Kehrer, Kenneth April 1, 1999 700+ words
...linked to bank sales of mutual funds, variable annuities now appear to be flourishing...was a watershed year for bank sales of variable annuities, the retail investments that...fixed annuities. For the first time, variable annuities outsold fixed annuities in...
Variable information printers; These pieces of equipment allow converters to...
Magazine article from: Label & Narrow Web November 1, 2007 700+ words
Printing variable information is an important part of the...that isn't the only application for variable data printing. Lottery games, coupons...are other examples of products that use variable data. Applications in the area of brand...
Variable annuities face new scrutiny; May make good targets for market...
Magazine article from: Investment News February 2, 2004 700+ words
...Hoffman Many mutual funds underlying variable annuities saw "eye-popping'' levels...many investors in funds that underlie variable annuities may have gotten hurt at the...timing goes on in funds that underlie variable annuities, it is a pretty good indication...
Variable-annuity funds.
Magazine article from: Investment News November 5, 2001 700+ words
Variable-annuity funds Corporates are growing...INCOME 1-year % 3-year % TOP 10 WM Variable Trust Equity Income 12.59 9.41 Endeavor...Equity Income 4.75 3.98 STI Classic Variable Value Income 3.55 1.83 BOTTOM 5 Nationwide...
For more facts and information, see all results

Source: HighBeam Research, Clustering visualizations of multidimensional data.

©2009 Gale, a part of Cengage Learning. All rights reserved.
About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA