Lecturer: Josh Bloom (UC Berkeley; Wise.io, Inc.)
Lecture Slides (PDF) here
View the IPython notebook here
-
What is machine learning?
- Flavors and facets of machine learning
- supervised, semi-supervised, clustering, ...
- classification / regression
- When to use it, when not to
- scikit-learn
- testing/validation sets, cross-validation
- metrics: ROC, AUC, confusion matrix
- Flavors and facets of machine learning
-
Regression
- Linear regression
- kNN
- random forest
[breakout: predict quasar redshifts from photometric data]
-
Classification
- SVM
- random forest
- deep learning
[breakout: predict Star/Galaxy/QSO from photometric data]
-
Improving your models
- hyperparameter optimization
GridSearchCV - dealing with missing data
- Feature selection / feature importance
- feature engineering
[breakout: redo Star/Galaxy/QSO from photometric data]
- hyperparameter optimization
-
Considerations in getting into production
- multicore / multimachine
- scikit-learn pipelines
- Bigdata machine learning: Graphlab, MLlib (Spark)
-
Make sure you have the latest version (0.15) of scikit-learn
conda update scikit-learn
-
Download some datasets locally
- TBD
| Time | What | Materials |
|---|---|---|
| 9:00-9:30 | Arrival/Caffinate | Coffee. Other performance-enhancing drugs. |
| 9:30 - ... | TBD |
-
Scikit-learn: Machine Learning in Python
-
Josh's lectures on scikit-learn from his graduate seminar class:
- Notebooks. Take a look at newsgroups.ipynb to see some natural language processsing.