Module : Data Mining

Semestre 5 SC VHS
C/TD/TP
VHH Total
C/TD/TP
V.H. Hebdomadaire Coef Crédits
C TD TP
UE Fondamentales 5.1 67.5 4.5 1.5 3 4 5

Course Description: 

The course introduces basic data mining concepts and techniques for discovering interesting patterns hidden in data including large-scale data sets. Topics covered include feature engineering, association, clustering, and correlation analysis.

Prerequisite : Linear algebra, Probability

Evaluation Method : Coursework (40%) + Final Exam (60%)

Course Content 

  • Introduction to Data Mining
  • Data and Feature Engineering
  • Information Theory
  • Association Analysis : Frequent itemsets, Association rules
  • Clustering of Data : Dissimilarity and scatter. K-means clustering, K-medoids clustering.Hierarchical clustering, interpreting clustering trees, different linkages, top-down and bottom-up. Determining the number of clusters.
  • Dimensionality Reduction : Principal component analysis. Directions of maximal variance,or equivalently, approximating a matrix by another matrix with a given (smaller)rank. Interpretation of principal components, usages, limitations. Multidimensional scaling, isomap, local linear embedding.
  • Factor Analysis
  • Feature Selection : Objective function, methods and algorithms.
  • Correlation analysis : Correlation. Canonical correlation analysis. Zero correlation versus independence. Shortcomings of correlation for nonlinear relationships. Rank correlation, maximal correlation, distance correlation.
  • Anomaly Detection
  • Data Mining Case Studies

References

  • Andrew R. Webb, Keith D. Copsey, Statistical Pattern Recognition, 3rd Edition, 2011
  • Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000. 
  • Charu C. Aggarwal, Data Mining: The Textbook, 2015, Springer