MMS: Data Mining (2018-I)
From hpcwiki
Contents |
Course contents
Chapter 1. Practical Machine Learning with PYTHON
- Downloading and installing PYTHON
- Installing and loading packages
- Reading and writing data
- Using PYTHON to manipulate data
- Applying basic statistics
- Visualizing data
- Getting a dataset for machine learning
Chapter 2. Data Exploration with Real Live Data
- Reading a dataset from a CSV file
- Converting types on character variables
- Detecting missing values
- Imputing missing values
- Exploring and visualizing data
- Predicting passenger survival with a decision tree
- Validating the power of prediction with a confusion matrix
- Assessing performance with the ROC curve
Chapter 3. PYTHON and Statistics
- In this chapter, we will cover the following topics:
- Understanding data sampling in PYTHON
- Operating a probability distribution in PYTHON
- Working with univariate descriptive statistics in PYTHON
- Performing correlations.
- Conducting an exact binomial test
- Performing the Kolmogorov-Smirnov test
- Understanding the Wilcoxon Rank Sum and Signed Rank test
- Working with Pearson’s Chi-squared test
- Conducting a one-way ANOVA
- Performing a two-way ANOVA
Chapter 4. Understanding Regression Analysis
- Fitting a linear regression model with Python
- Summarizing linear model fits
- Using linear regression to predict unknown values
- Generating a diagnostic plot of a fitted model
- Fitting a polynomial regression model with Python
- Fitting a robust linear regression model with Python
- Studying a case of linear regression on SLID data
Chapter 5. Classification (I) – Tree, Lazy, and Probabilistic
In this chapter, we will cover the following recipes:
Preparing the training and testing datasets Building a classification model with recursive partitioning trees Visualizing a recursive partitioning tree Measuring the prediction performance of a recursive partitioning tree Pruning a recursive partitioning tree Building a classification model with a conditional inference tree Visualizing a conditional inference tree Measuring the prediction performance of a conditional inference tree Classifying data with logistic regression
Applied Material (to be updated)
Chapter 6. Working with Spatial Data
- Motivation: What's so great about spatial data?
- Spatial data structures.
- Making maps.
- Static maps with Python.
- Projections.
- Geocoding, routes, and distances.
- Dynamic maps with leaflet.
- Extended example: Congressional districts.
- Election results.
- Congressional districts.
- Putting it all together.
- Using leaflet.
- Effective maps: How (not) to lie.
- Extended example: Historical airline route maps.
- Projecting polygons.