Personal tools

MMS: Data Mining (2018-I)

From hpcwiki

Revision as of 17:49, 6 February 2018 by Hfrancot (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Course contents

Chapter 1. Practical Machine Learning with PYTHON

  • Downloading and installing PYTHON
  • Installing and loading packages
  • Reading and writing data
  • Using PYTHON to manipulate data
  • Applying basic statistics
  • Visualizing data
  • Getting a dataset for machine learning


Chapter 2. Data Exploration with Real Live Data

  • Reading a dataset from a CSV file
  • Converting types on character variables
  • Detecting missing values
  • Imputing missing values
  • Exploring and visualizing data
  • Predicting passenger survival with a decision tree
  • Validating the power of prediction with a confusion matrix
  • Assessing performance with the ROC curve

Chapter 3. PYTHON and Statistics

  • In this chapter, we will cover the following topics:
  • Understanding data sampling in PYTHON
  • Operating a probability distribution in PYTHON
  • Working with univariate descriptive statistics in PYTHON
  • Performing correlations.
  • Conducting an exact binomial test
  • Performing the Kolmogorov-Smirnov test
  • Understanding the Wilcoxon Rank Sum and Signed Rank test
  • Working with Pearson’s Chi-squared test
  • Conducting a one-way ANOVA
  • Performing a two-way ANOVA


Chapter 4. Understanding Regression Analysis

  • Fitting a linear regression model with Python
  • Summarizing linear model fits
  • Using linear regression to predict unknown values
  • Generating a diagnostic plot of a fitted model
  • Fitting a polynomial regression model with Python
  • Fitting a robust linear regression model with Python
  • Studying a case of linear regression on SLID data


Chapter 5. Classification (I) – Tree, Lazy, and Probabilistic

In this chapter, we will cover the following recipes:

Preparing the training and testing datasets Building a classification model with recursive partitioning trees Visualizing a recursive partitioning tree Measuring the prediction performance of a recursive partitioning tree Pruning a recursive partitioning tree Building a classification model with a conditional inference tree Visualizing a conditional inference tree Measuring the prediction performance of a conditional inference tree Classifying data with logistic regression

Applied Material (to be updated)

Chapter 6. Working with Spatial Data

  • Motivation: What's so great about spatial data?
  • Spatial data structures.
  • Making maps.
    • Static maps with Python.
    • Projections.
    • Geocoding, routes, and distances.
    • Dynamic maps with leaflet.
  • Extended example: Congressional districts.
    • Election results.
    • Congressional districts.
    • Putting it all together.
    • Using leaflet.
  • Effective maps: How (not) to lie.
  • Extended example: Historical airline route maps.
  • Projecting polygons.