Module : Data Mining
Semestre 5 SC | VHS C/TD/TP |
VHH Total C/TD/TP |
V.H. Hebdomadaire | Coef | Crédits | ||
---|---|---|---|---|---|---|---|
C | TD | TP | |||||
UE Fondamentales 5.1 | 67.5 | 4.5 | 1.5 | 3 | 4 | 5 |
Course Description:
The course introduces basic data mining concepts and techniques for discovering interesting patterns hidden in data including large-scale data sets. Topics covered include feature engineering, association, clustering, and correlation analysis.
Prerequisite : Linear algebra, Probability
Evaluation Method : Coursework (40%) + Final Exam (60%)
Course Content
- Introduction to Data Mining
- Data and Feature Engineering
- Information Theory
- Association Analysis : Frequent itemsets, Association rules
- Clustering of Data : Dissimilarity and scatter. K-means clustering, K-medoids clustering.Hierarchical clustering, interpreting clustering trees, different linkages, top-down and bottom-up. Determining the number of clusters.
- Dimensionality Reduction : Principal component analysis. Directions of maximal variance,or equivalently, approximating a matrix by another matrix with a given (smaller)rank. Interpretation of principal components, usages, limitations. Multidimensional scaling, isomap, local linear embedding.
- Factor Analysis
- Feature Selection : Objective function, methods and algorithms.
- Correlation analysis : Correlation. Canonical correlation analysis. Zero correlation versus independence. Shortcomings of correlation for nonlinear relationships. Rank correlation, maximal correlation, distance correlation.
- Anomaly Detection
- Data Mining Case Studies
References
- Andrew R. Webb, Keith D. Copsey, Statistical Pattern Recognition, 3rd Edition, 2011
- Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
- Charu C. Aggarwal, Data Mining: The Textbook, 2015, Springer