Tutorial Title: Data Mining
The information in the world doubles about every 20 months. Among the
fastest growing sources of data are the internet, industrial process
supervision systems, business data bases, bio technology and automatic
imaging. Besides data acquisition and storage, data processing and
exploitation are the biggest challenges. The goal is to extract the relevant
information (the "knowledge") from large data sets. For this information
extraction process, conventional (linear) statistical methods such as
correlation and regression are applied, but also methods from cluster
analysis, neural computation, and machine learning.
The collection of these data analysis methods is called "data mining". Data
mining is part of the knowledge discovery process, which also covers
preprocessing, filtering, visualization, transformation and feature
generation.
The tutorial introduces some of the most important methods for data mining
and knowledge discovery and presents some real-world application examples.
It is structured as follows:
1. introduction and definitions
2. data sources, characteristics, and distortions
3. preprocessing and filtering
4. visualization (projections, principal component analysis, multidimensional scaling, self-organizing maps)
5. data transformation and feature generation
6. data analysis (correlation, spurious correlation, regression, classification, decision trees, ID3, sequential clustering,
c-means and its relatives, cluster estimation, radial basis functions, vector quantization)
7. application examples
Hard copies of the tutorial material will be provided.