×

Product-driven data mining. (English) Zbl 1101.68510

Authors’ conclusion: For a given product two tasks were required. The first being identification of consumers that would react favorably to product and the second being the inference of other characteristics concerning these consumers.
This was accomplished with a twofold strategy. By ranking the difference of means across all of the factors, those questions that best characterize favorable consumers can be identified. Once identified, the data mine can be used to estimate the power of a given strategy to correctly identify a given individual. For case study A the identification algorithm was simply to use an individuals home value. By optimizing the cutoff level this single variable was able to correctly identify 41% of the individuals in the data mine. By using a combination of questions this percentage could be increased. Performing a cluster analysis that is rooted at these key identifying questions allows other characteristics of these consumers to be inferred.
The two case studies show that being able to identify questions that significantly differentiate respondents with respect to a given product is a fundamental part of the process. Failure to make this identification decreases the resolution of the subsequent cluster analysis. Case study A exemplifies the situation when there is a clear separation with respect to a product whereas case study B illustrates the decrease in resolution when no clear separation exists. In general, this first step may be dependent on the type of data and the desired differentiations. A combination of factor analysis and the consideration of differences in basic test statistics proved to be superior to methods based on latent variables or principal components, due to the underlying eigenstructure of the data mine.
To increase the capability of this method future advances should include a more sophisticated clustering algorithm. For example, PLS/SVD could be used on the clustering subgroups after the first step of separating with the difference of the means statistic. An addition, automatic determination of the identification power for a given set of identifying questions should also be addressed.

MSC:

68P15 Database theory
PDFBibTeX XMLCite