MMS: Data Mining (2018-I)
From hpcwiki
Contents |
Course contents
Chapter 1. Introduction to PYTHON
Session 1
- Downloading and installing PYTHON
- Installing and loading packages
- Reading and writing data
- Converting types on character variables
- Presentation File:MMS DataMining1.pdf
- First script [1]
- Using PYTHON to manipulate data
- Reading a dataset from a CSV file
- Applying basic statistics
- Working with univariate descriptive statistics in PYTHON
- Performing correlations.
- Operating a probability distribution in PYTHON
- Fitting a linear regression model with Python
- Summarizing linear model fits
- Session 2 (iPython Notebook) [2]
- Data examples:
- Interactive Regression - Machine Learning (iPython Notebook) [6]
- Interactive Logistic Regression (iPython Notebook) [7]
- Curve Fitting (iPython Notebook) [8]
- Workshop iPython Notebook: [9]
- Dataset 1 (mobile phone robbering data sample -Colombia, 2017): [10]
- Dataset 2 (customer database -marketing): [11]
- Detecting missing values
- Imputing missing values
- Understanding data sampling in PYTHON
- Parametric and non-parametric statistical inference
- Conducting an exact binomial test
- Workshop iPython Notebook: [15]
- Performing the Kolmogorov-Smirnov test
- Understanding the Wilcoxon Rank Sum and Signed Rank test
- Working with Pearson’s Chi-squared test
- Conducting a one-way ANOVA
- Performing a two-way ANOVA
- Propose the course final project based on a real-life challenging data analysis problem
- Form: Data Analysis Problem definition
- Introduction to Machine Learning
- Supervised and unsupervised learning
- Getting a suitable dataset
- Predicting passenger survival with a decision tree
- Validating the power of prediction with a confusion matrix
- Assessing performance with the ROC curve
- Session presentation [18]
- Session notebook [19]
- Balance scale data file [20]
- Titanic test [21]
- Titanic test [22]
- Graphviz [23]
- Preparing the training and testing datasets
- Building a classification model with recursive partitioning trees
- Visualizing a recursive partitioning tree
- Measuring the prediction performance of a recursive partitioning tree
- Pruning a recursive partitioning tree
- Building a classification model with a conditional inference tree
- Visualizing a conditional inference tree
- Measuring the prediction performance of a conditional inference tree
- Classifying data with logistic regression
- Motivation: What's so great about spatial data?
- Spatial data structures.
- Making maps.
- Static maps with Python.
- Projections.
- Geocoding, routes, and distances.
- Dynamic maps with leaflet.
- Extended example: Congressional districts.
- Election results.
- Congressional districts.
- Putting it all together.
- Using leaflet.
- Effective maps: How (not) to lie.
- Extended example: Historical airline route maps.
- Projecting polygons.
Chapter 2. Data Exploration with Real Live Data
Session 2
Files for Download:
Homework - Workshop 1:
Session 3
Files to download
Homework - Workshop 2:
Session 4
Files
Special Session: Project definition
Session 5
Files: