AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

An evaluation of self-organizing map networks as a robust alternative to factor analysis in data mining applications.

Information Systems Research

| June 01, 2001 | Kiang, Melody Y.; Kumar, Ajith | COPYRIGHT 2001 Institute for Operations Research and the Management Sciences. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

Kohonen's self-organizing map (SOM) network is one of the most important network architectures developed during the 1980s. The main function of SOM networks is to map the input data from an n-dimensional space to a lower dimensional (usually one- or two-dimensional) plot while maintaining the original topological relations. Therefore, it can be viewed as an analog of factor analysis. In this research, we evaluate the feasibility of using SOM networks as a robust alternative to factor analysis and clustering for data mining applications. Specifically, we compare SOM network solutions to factor analytic and K-Means clustering solutions on simulated data sets with known underlying factor and cluster structures. The comparisons indicate that the SOM networks provide solutions superior to unrotated factor solutions in general and provide more accurate recovery of underlying cluster structures when the input data are skewed. Our findings suggest that SOM networks can provide robust alternatives to traditional factor analysis and clustering techniques in data mining applications.

(Data Mining; Kohonen Networks; Factor Analysis; Data Reductive; Clustering Analysis)

1. Introduction

With the increased availability of data collected from the Internet and other sources and the implementation of enterprise-wide databases, the amount of data that companies possess is growing at a phenomenal rate. Hence, it becomes increasingly important for the companies to be able to better manage their databases. Data mining is concerned with identifying interesting patterns and presenting them in a concise and meaningful manner (Piatetsky-Shapiro and Frawley 1991). Data mining tools and techniques that facilitate automated and intelligent database analysis and interpretation have been proposed, and some have been successfully implemented (Fayyad et al. 1996, Westphal & Blaxton 1998, Balachandran et al. 1999).

The widespread availability of data mining software has given practitioners a variety of new alternatives to traditional, statistical data analytic techniques. These alternatives include several techniques based on concepts from machine learning, pattern recognition, and neural networks (Chen et al. 2000, Vanecko and Russo 1999, Spangler et al. 1999, Cooper and Giuffrida 2000). Many of these newer techniques typically serve to achieve the same set of data analytic objectives as those sought to be accomplished by traditional statistical analysis: regression, data reduction, clustering, etc. Often, results obtained using newer data mining techniques are interpreted and utilized in the same manner as those obtained with statistical modeling. For example, the problem of market segmentation involves partitioning a population (of consumers) into relatively homogeneous subsets, so that each subset (segment) can be targeted using a marketing program tailored specifically to the needs of consumers in that subset. In practice, data from a sample of customers (drawn from the relevant population) are analyzed to estimate the segments (number and relative sizes) using a clustering procedure such as K-Means clustering; frequently, the data are preprocessed using factor analysis to reduce dimensionality and facilitate managerial interpretability, and the clustering is done using factor scores (e.g., Dillon et al. 1985, Doyle and Saunders 1985). Now the preprocessing for data reduction and the clustering task can be accomplished using algorithms based on neural networks.

The substitution of neural network-based techniques in the place of statistical modeling techniques needs justification on grounds other than that of novelty. A general a priori justification for preferring neural network-based approaches to statistical ones is that they do not require the invocation of assumptions about the underlying data generating mechanisms (e.g., the distributional assumption of multivariate normality that is invoked to justify the use of several multivariate statistical modeling procedures). On the other hand, statistical techniques provide a wealth of diagnostics that can be used to rigorously evaluate alternative solutions (e.g., error bounds and confidence intervals for parameter estimates, hypothesis testing, etc.). In this paper, we attempt to provide additional justification by presenting preliminary evidence that the SOM network is a robust alternative to factor analysis.

While Kohonen's self-organizing networks have been successfully applied as a classification tool to various problem domains, including speech recognition (Zhao and Rowden 1992, Leinonen et al. 1993), image data compression (Manikopoulos 1993), image or character recognition (Bimbo et al. 1993, Sabourin and Mitiche 1993), robot control (Walter and Schulen 1993, Ritter et al. 1989), and medical diagnosis (Vercauteren et al. 1990), its potential as a robust substitute for factor analysis and clustering tool remains relatively unresearched. Murtagh and Hernandez-Pajares (1995) examined a number of properties of SOM networks and compared them with various methods of data analysis including principal components and K-Means clustering. Clustering technique is considered an important data mining algorithm that can be applied to various problem domains. However, when the dimensionality of the problem is high--there is very large number of attributes (variables) involved--the size of the search space for model induction grows in a combinatorially explosive manner. Moreover, it increases the chances that a data mining algorithm will find spurious patterns that are not valid. Approaches to this problem include methods to reduce the effective dimensionality of the problem and the use of prior knowledge to identify irrelevant variables (Fayyad et al. 1996). The application of SOM networks as an alternative to factor analysis can reduce the problem space from several to few dimensions.

Related articles from newspapers, magazines, journals, and more
SPSS Releases New Version of Clementine Data Mining Workbench.
Press release article from: Business Wire December 29, 2000 700+ words
...Clementine includes data mining application templates...telecommunications data mining projects. The latest...RT, cluster and factor analysis algorithms means Clementine...sure to find the right data mining technique for their...
Data Mining Using SAS Applications.(Book Review)
Magazine article from: Technometrics Caby, Errol C. May 1, 2004 700+ words
Data Mining Using SAS Applications...concepts and algorithms in data mining together with instructions...terminology for each of the data mining algorithms covered in...analysis, exploratory factor analysis, and disjoint cluster...
SPSS Ships Data Mining and Data Analysis Software for AS/400 Users; Data...
Press release article from: Business Wire June 30, 1999 700+ words
...edge analysis and data mining capabilities, as well...preparation and production data mining tools to: - Win new...attribute ratings with factor analysis. - Detect fraud...s Intelligent Miner data mining software to provide...
Data Mining Using SAS Applications.(Telegraphic Reviews)(Book Review)
Magazine article from: Journal of the American Statistical Association Spector, Phil June 1, 2004 700+ words
Data Mining Using SAS Applications...a particular aspect of data mining (exploratory data analysis...techniques (e.g., factor analysis, OLS and logistic regression...starting point for performing data mining using SAS/STAT without...
The Elements of Statistical Learning: Data Mining, Inference, and...
Magazine article from: Journal of the American Statistical Association Ruppert, David June 1, 2004 700+ words
...Elements of Statistical Learning: Data Mining, Inference, and Prediction...statistical learning, and data mining (not disjoint subjects). I...principal components analysis and factor analysis, but it uses higher-order...
Data mining researchers search for new tools; searching for simple...
Magazine article from: R & D Studt, Tim July 1, 2002 700+ words
...brings to light the need for reliable data mining tools to create meaningful knowledge...by adding special criteria onto the data mining issues--that of temporal variations within a database and multi-relational data mining. This month, researchers meet to discuss...
Data mining and customer relationship marketing in the banking industry.
Magazine article from: Singapore Management Review Chye, Koh Hian Leong Gerry, Chan Kin July 1, 2002 700+ words
Advances in computer hardware and data mining software have made data mining accessible and affordable to many businesses. Hence, it is no surprise that data mining has gained widespread attention and increasing popularity in the commercial world...
Data Mining Gets Real.(Product Information)
Magazine article from: Enterprise Systems Journal Toigo, Jon William April 1, 1999 700+ words
...few years ago, the concept of data mining was introduced to the corporate...tidbits of information. Early data mining efforts produced rather dismal...which were quickly attributed by data mining advocates to erred methods in...
For more facts and information, see all results
©2009 Gale, a part of Cengage Learning. All rights reserved.
About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA