Machine_Learning

#Machine Learning Tutorial

This module illustrates some basic machine learning in Python using Sci-Kit Learn.

We do not assume a priori that any single model will be best for the data. Instead, we loop over multiple classifiers and parameterizations. In this way, we can run hundreds of models and select the best one on a variety of metrics, such as precision and recall.

Example Precision-Recall Plot:

Quick Start Guide

Example data for this repository comes from the General Social Survey (GSS) 2014. More notes on the data preprocessing are detailed in the data folder.

To run the example:

git clone --recursive https://github.com/jmausolf/Python_Tutorials
cd Python_Tutorials/Machine_Learning
python run.py

The Details

To modify the default data, outcome variable, or parameters, open the run.py script with your favorite text editor, such as Atom, Sublime, or Vim. Here, you must specify the dataset (as .CSV), the outcome variable you would like to predict (for binary classification), and the features you would like to use to make the predictions.

Default Example:

#Define Data
dataset  = 'file'
outcome  = 'partyid_str_rep'
features = ['age', 'sex', 'race', 'educ', 'rincome']

Once you edit these fields, save the script, and in terminal execute: python run.py

Note:

Your data may have hundreds of features (independent variables/predictors). If you would like to use all of them (and would rather not type write them all out explicitly) simply uses the --all_features option of the magic loop.

python run.py --all_features True

Of course, an overlooked aspect thus far is feature development. The GSS data in the example is not in the ideal form. For example, most of the data is categorical. We might want to make indicators for each categorical column, calculate various aggregations or interactions, among other possibilities. An ideal data pipeline might make the changes to the feature set prior running this script.

Acknowledgements

This tutorial makes use of a modified submodule, originally forked from @rayidghani magicloops. It has been updated to run in Python2 or Python3. In addition, my modified fork modifies the plotting code and several of the functions to take a user-specified dataset, outcome variable, and features.

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
results		results
sklearn_magicloops @ 01a18a7		sklearn_magicloops @ 01a18a7
Magic_Loop.Rmd		Magic_Loop.Rmd
Magic_Loop.html		Magic_Loop.html
README.md		README.md
basic_example.py		basic_example.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Example Precision-Recall Plot:

Quick Start Guide

The Details

Default Example:

Note:

Acknowledgements

FilesExpand file tree

Machine_Learning

Directory actions

More options

Directory actions

More options

Latest commit

History

Machine_Learning

Folders and files

parent directory

README.md

Example Precision-Recall Plot:

Quick Start Guide

The Details

Default Example:

Note:

Acknowledgements