Personal tools

MMS: Data Mining (2018-I)

From hpcwiki

Jump to: navigation, search

Contents

Course contents

Chapter 1. Introduction to PYTHON

Session 1

  • Downloading and installing PYTHON
  • Installing and loading packages
  • Reading and writing data
  • Converting types on character variables
  1. Presentation File:MMS DataMining1.pdf
    1. First script [1]
    2. Chapter 2. Data Exploration with Real Live Data

      Session 2

      • Using PYTHON to manipulate data
      • Reading a dataset from a CSV file
      • Applying basic statistics
      • Working with univariate descriptive statistics in PYTHON
      • Performing correlations.
      • Operating a probability distribution in PYTHON
      • Fitting a linear regression model with Python
      • Summarizing linear model fits

      Files for Download:

      • Session 2 (iPython Notebook) [2]
      • Data examples:
      • Interactive Regression - Machine Learning (iPython Notebook) [6]
      • Interactive Logistic Regression (iPython Notebook) [7]
      • Curve Fitting (iPython Notebook) [8]

      Homework - Workshop 1:

      • Workshop iPython Notebook: [9]
      • Dataset 1 (mobile phone robbering data sample -Colombia, 2017): [10]
      • Dataset 2 (customer database -marketing): [11]

      Session 3

      • Detecting missing values
      • Imputing missing values
      • Understanding data sampling in PYTHON
      • Parametric and non-parametric statistical inference
      • Conducting an exact binomial test

      Files to download

      • Sampling and imputation [12]
      • Normal Distribution [13]
      • Example data [14]

      Homework - Workshop 2:

      • Workshop iPython Notebook: [15]

      Session 4

      • Performing the Kolmogorov-Smirnov test
      • Understanding the Wilcoxon Rank Sum and Signed Rank test
      • Working with Pearson’s Chi-squared test
      • Conducting a one-way ANOVA
      • Performing a two-way ANOVA

      Files

      • Nonparametric tests [16]
      • Example data [17]

      Special Session: Project definition


      Session 5

      • Introduction to Machine Learning
      • Supervised and unsupervised learning
      • Getting a suitable dataset
      • Predicting passenger survival with a decision tree
      • Validating the power of prediction with a confusion matrix
      • Assessing performance with the ROC curve

      Files:

      • Session presentation [18]
      • Session notebook [19]
      • Balance scale data file [20]
      • Titanic test [21]
      • Titanic test [22]
      • Graphviz [23]

      Chapter 2. Classification (I) – Tree, Lazy, and Probabilistic

      Session 6

      • Preparing the training and testing datasets
      • Building a classification model with recursive partitioning trees
      • Visualizing a recursive partitioning tree
      • Measuring the prediction performance of a recursive partitioning tree
      • Pruning a recursive partitioning tree

      Session 7

      • Building a classification model with a conditional inference tree
      • Visualizing a conditional inference tree
      • Measuring the prediction performance of a conditional inference tree
      • Classifying data with logistic regression

      Applied Material (to be updated)

      Chapter 3. Working with Spatial Data

      Session 8

      • Motivation: What's so great about spatial data?
      • Spatial data structures.
      • Making maps.
        • Static maps with Python.
        • Projections.
        • Geocoding, routes, and distances.
        • Dynamic maps with leaflet.
      • Extended example: Congressional districts.
        • Election results.
        • Congressional districts.
        • Putting it all together.
        • Using leaflet.
      • Effective maps: How (not) to lie.
      • Extended example: Historical airline route maps.
      • Projecting polygons.