{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Modeling and Simulation in Python\n", "\n", "Chapter 10\n", "\n", "Copyright 2017 Allen Downey\n", "\n", "License: [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "# import functions from the modsim.py module\n", "from modsim import *\n", "\n", "from pandas import read_html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Under the hood\n", "\n", "To get a `DataFrame` and a `Series`, I'll read the world population data and select a column.\n", "\n", "`DataFrame` and `Series` contain a variable called `shape` that indicates the number of rows and columns." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "filename = 'data/World_population_estimates.html'\n", "tables = read_html(filename, header=0, index_col=0, decimal='M')\n", "table2 = tables[2]\n", "table2.columns = ['census', 'prb', 'un', 'maddison', \n", " 'hyde', 'tanton', 'biraben', 'mj', \n", " 'thomlinson', 'durand', 'clark']\n", "table2.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "census = table2.census / 1e9\n", "census.shape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "un = table2.un / 1e9\n", "un.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `DataFrame` contains `index`, which labels the rows. It is an `Int64Index`, which is similar to a NumPy array." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [], "source": [ "table2.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And `columns`, which labels the columns." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [], "source": [ "table2.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And `values`, which is an array of values." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [], "source": [ "table2.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Series` does not have `columns`, but it does have `name`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [], "source": [ "census.name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It contains `values`, which is an array." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "census.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And it contains `index`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "census.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you ever wonder what kind of object a variable refers to, you can use the `type` function. The result indicates what type the object is, and the module where that type is defined.\n", "\n", "`DataFrame`, `Int64Index`, `Index`, and `Series` are defined by Pandas.\n", "\n", "`ndarray` is defined by NumPy." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "type(table2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "type(table2.index)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "type(table2.columns)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "type(table2.values)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "type(census)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "type(census.index)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [], "source": [ "type(census.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optional exercise\n", "\n", "The following exercise provides a chance to practice what you have learned so far, and maybe develop a different growth model. If you feel comfortable with what we have done so far, you might want to give it a try.\n", "\n", "**Optional Exercise:** On the Wikipedia page about world population estimates, the first table contains estimates for prehistoric populations. The following cells process this table and plot some of the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select `tables[1]`, which is the second table on the page." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "table1 = tables[1]\n", "table1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not all agencies and researchers provided estimates for the same dates. Again `NaN` is the special value that indicates missing data." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "table1.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the estimates are in a form we can't read as numbers. We could clean them up by hand, but for simplicity I'll replace any value that has an `M` in it with `NaN`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "table1.replace('M', np.nan, regex=True, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we'll replace the long column names with more convenient abbreviations." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "table1.columns = ['prb', 'un', 'maddison', 'hyde', 'tanton', \n", " 'biraben', 'mj', 'thomlinson', 'durand', 'clark']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function plots selected estimates." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def plot_prehistory(table):\n", " \"\"\"Plots population estimates.\n", " \n", " table: DataFrame\n", " \"\"\"\n", " plot(table.prb, 'ro', label='PRB')\n", " plot(table.un, 'co', label='UN')\n", " plot(table.hyde, 'yo', label='HYDE')\n", " plot(table.tanton, 'go', label='Tanton')\n", " plot(table.biraben, 'bo', label='Biraben')\n", " plot(table.mj, 'mo', label='McEvedy & Jones')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the results. Notice that we are working in millions now, not billions." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": false }, "outputs": [], "source": [ "plot_prehistory(table1)\n", "decorate(xlabel='Year', \n", " ylabel='World population (millions)',\n", " title='Prehistoric population estimates')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `xlim` to zoom in on everything after Year 0." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "plot_prehistory(table1)\n", "decorate(xlim=[0, 2000], xlabel='Year', \n", " ylabel='World population (millions)',\n", " title='Prehistoric population estimates')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See if you can find a model that fits these data well from Year -1000 to 1940, or from Year 1 to 1940.\n", "\n", "How well does your best model predict actual population growth from 1950 to the present?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }