Research Group of Prof. Dr. M. Griebel
Institute for Numerical Simulation
maximize

Data mining with sparse grids

Participants

Jochen Garcke, Prof. Dr. Michael Griebel, Dr. Michael Thess (prudsys AG), Dr. Ulrich Windheuser (West LB)

Description

Data mining is the process of finding hidden patterns, relations and trends in large data sets. It plays an increasing role in commerce and science. We introduce a new data mining technique for the classification problem.

The technique is based on the regularization network approach but, in contrast to other methods which employ ansatz functions associated to data points, we use a grid in a high-dimensional feature space for the minimization process.

To cope with the curse of dimensionality, we employ sparse grids. Thus, only $O(h_n^{-1} n^{d-1})$ instead of $O(h_n^{-d})$ grid points and unknowns are involved. Here $d$ denotes the dimension of the feature space and $h_n = 2^{-n}$ gives the mesh size. To be precise we use the sparse grid combination technique where the classification problem is discretized and solved on a certain sequence of conventional grids with uniform mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the solutions on these different grids by linear combination. In contrast to other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural and straightforward way.

The method computes a nonlinear classifier but scales only linearly with the number of data points and is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. Our new method achieves correctness rates which are competitive to those of the best existing methods.

Examples

Cooperation

BMBF "Neue Mathematische Verfahren in Industrie und Dienstleistungen"

References

[1] J. Garcke, M. Hegland, and O. Nielsen. Parallelisation of sparse grids for large scale data analysis. In P. Sloot, D. Abramson, A. Bogdanov, J. Dongarra, A. Zomaya, and Y. Gorbachev, editors, Proceedings of the International Conference on Computational Science 2003 (ICCS 2003) Melbourne, Australia, volume 2659 of Lecture Notes in Computer Science, pages 683-692. Springer, 2003.
[2] J. Garcke and M. Griebel. Classification with sparse grids using simplicial basis functions. Intelligent Data Analysis, 6(6):483-502, 2002. (shortened version appeared in KDD 2001, Proc. of the Seventh ACM SIGKDD, F. Provost and R. Srikant (eds.), pages 87-96, ACM, 2001).
[3] J. Garcke and M. Griebel. On the parallelization of the sparse grid approach for data mining. In S. Margenov, J. Wasniewski, and P. Yalamov, editors, Large-Scale Scientific Computations, Third International Conference, LSSC 2001, Sozopol, Bulgaria, volume 2179 of Lecture Notes in Computer Science, pages 22-32. Springer, 2001.
[4] J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225-253, 2001.