AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
Kohonen's self-organizing map (SOM) network is one of the most important network architectures developed during the 1980s. The main function of SOM networks is to map the input data from an n-dimensional space to a lower dimensional (usually one- or two-dimensional) plot while maintaining the original topological relations. Therefore, it can be viewed as an analog of factor analysis. In this research, we evaluate the feasibility of using SOM networks as a robust alternative to factor analysis and clustering for data mining applications. Specifically, we compare SOM network solutions to factor analytic and K-Means clustering solutions on simulated data sets with known underlying factor and cluster structures. The comparisons indicate that the SOM networks provide solutions superior to unrotated factor solutions in general and provide more accurate recovery of underlying cluster structures when the input data are skewed. Our findings suggest that SOM networks can provide robust alternatives to traditional factor analysis and clustering techniques in data mining applications.
(Data Mining; Kohonen Networks; Factor Analysis; Data Reductive; Clustering Analysis)
1. Introduction
With the increased availability of data collected from the Internet and other sources and the implementation of enterprise-wide databases, the amount of data that companies possess is growing at a phenomenal rate. Hence, it becomes increasingly important for the companies to be able to better manage their databases. Data mining is concerned with identifying interesting patterns and presenting them in a concise and meaningful manner (Piatetsky-Shapiro and Frawley 1991). Data mining tools and techniques that facilitate automated and intelligent database analysis and interpretation have been proposed, and some have been successfully implemented (Fayyad et al. 1996, Westphal & Blaxton 1998, Balachandran et al. 1999).
The widespread availability of data mining software has given practitioners a variety of new alternatives to traditional, statistical data analytic techniques. These alternatives include several techniques based on concepts from machine learning, pattern recognition, and neural networks (Chen et al. 2000, Vanecko and Russo 1999, Spangler et al. 1999, Cooper and Giuffrida 2000). Many of these newer techniques typically serve to achieve the same set of data analytic objectives as those sought to be accomplished by traditional statistical analysis: regression, data reduction, clustering, etc. Often, results obtained using newer data mining techniques are interpreted and utilized in the same manner as those obtained with statistical modeling. For example, the problem of market segmentation involves partitioning a population (of consumers) into relatively homogeneous subsets, so that each subset (segment) can be targeted using a marketing program tailored specifically to the needs of consumers in that subset. In practice, data from a sample of customers (drawn from the relevant population) are analyzed to estimate the segments (number and relative sizes) using a clustering procedure such as K-Means clustering; frequently, the data are preprocessed using factor analysis to reduce dimensionality and facilitate managerial interpretability, and the clustering is done using factor scores (e.g., Dillon et al. 1985, Doyle and Saunders 1985). Now the preprocessing for data reduction and the clustering task can be accomplished using algorithms based on neural networks.