{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Learning the ropes of *Python*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome in this four-part notebook which aims to prepare you to use *Python* during your master dissertation. As these notebooks start from scratch, some of these topics may seem too simple. Of course, feel free to skip those topics that are too easy, and concentrate on the harder ones (or the challenges at the end of each notebook).\n", "\n", "The *Python* notebooks are divided in four parts:\n", "1. The Basics: Syntax, Strings, and Conditionals\n", "2. Functions and Classes\n", "3. Lists, Loops, Dictionaries, and File I/O\n", "4. NumPy and Matplotlib\n", "\n", "In every notebook, we have not only provided an introduction to each subject, but also some small tests and larger challenges for you to tackle. This is the ideal way to verify whether you completely understand the topics at hand. \n", "\n", "Note that we cannot be exhaustive. However, these notebooks aim to get you started with *Python* with minimum effort. Further details can be found on the [*Python* homepage](http://docs.python.org/3/), the [*NumPy* homepage](http://www.numpy.org/), and the [*Matplotlib* homepage](http://matplotlib.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4: *NumPy* and *Matplotlib*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Numbers have an important story to tell. They rely on you to give them a clear and convincing voice.*
\n", "
-- Stephen Few, Data scientist and author
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chapter 17: *NumPy*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous three notebooks, we have used the built-in arithmetic functions of *Python* as well as the `math` module to do basic calculations. However, once more elaborate arithmetic operations are involved, or once we want to make efficient use of vector and matrix calculus, *NumPy* is an indispensable tool.\n", "\n", "*NumPy* is the fundamental package for scientific computing with *Python*. It contains among other things:\n", " * a powerful N-dimensional array object\n", " * useful linear algebra, Fourier transform, and random number capabilities\n", " * sophisticated (broadcasting) functions\n", " * tools for integrating C/C++ and Fortran code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this chapter, we will briefly go over the first two of these points, as they will be used the most throughout your thesis. However, *NumPy* contains much more than what will be covered in this notebook!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But first things first. *NumPy* is automatically installed when you used *Anaconda* to install *Python*. The module can be loaded using the command" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We strongly advise not to use a wild card import for *NumPy*, as many of its functions share their names with built-in *Python* functions, such as `sum`. And yes, we used an alias to shorten the name from five to two characters: *NumPy* will be called very often." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 17.1. Why we need *NumPy*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might wonder why we need a separate data type for vectors. Can't we just use the *Python* list data type? Well, let's see how natural that is if we try to add two \"vectors\":" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "vec_1 = [1,2,3]\n", "vec_2 = [4,5,6]\n", "\n", "print(vec_1+vec_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, that's clearly not what we wanted... The `+` operator concatenates the two lists, rather than summing them...\n", "\n", "*NumPy*’s main object is the homogeneous **multidimensional array**. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. This multidimensional array may seem a lot like a nested list in *Python*, but adhering to the normal intuitions one has about vector and matrix calculus. Some nomenclature: in *NumPy*, dimensions are called **axes**. The number of axes is the **rank** of the array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*NumPy*’s `array` class is called `ndarray`. It is also known by the alias `array`. The most important attributes of an `ndarray` object are:\n", "\n", "1. `ndarray.ndim`: returns the rank of the array\n", "2. `ndarray.shape`: returns a tuple of integers indicating the size of the array in each dimension\n", "3. `ndarray.size`: the total number of elements in the array\n", "4. `ndarray.dtype`: an object describing the type of the elements in the array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how this works in practice. Recall the vector sum we tried to evaluate before. With *NumPy*, we can easily convert a *Python* list to an `ndarray` by using `np.array(list)`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# First, let's see what the original datatype of our vector was\n", "print(type(vec_1))\n", "\n", "# Now, let's recast the two vectors as NumPy ndarrays\n", "vec_1 = np.array(vec_1)\n", "vec_2 = np.array(vec_2)\n", "\n", "# Print out the new datatype\n", "print(type(vec_1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do we now get when summing these vectors?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(vec_1+vec_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That seems more like it! As noted above, a `ndarray` has four major attributes, which will be used quite often, for instance for looping over arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# Printing the rank (here a 1D array)\n", "print(vec_1.ndim)\n", "\n", "# Printing the shape (here 1 dimension with 3 elements)\n", "print(vec_1.shape)\n", "\n", "# Printing the size (here 3)\n", "print(vec_1.size)\n", "\n", "# Printing the datatype of the elements contained in the array (here integers)\n", "print(vec_1.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that, when printing out the datatype of the elements contained in the *NumPy* array of integers above, we got **int32** back instead of the expected **int**. Indeed, *NumPy* provides datatypes of its own, which are often more efficient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a last remark in this introduction, you might wonder why we can't simply loop over the two *Python* lists to sum them. As in many cases, the answer is: efficiency. In the code below, we use list comprehension to sum two *Python* lists in a naive way. (The lists are longer than above, to clearly show the difference in efficiency)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# The timeit module will be used to test the performance of some Python code\n", "from timeit import Timer\n", "\n", "# The code to be timed\n", "stmt = \"[i1 + i2 for i1, i2 in zip(list_1, list_2)]\"\n", "\n", "# Input needed to run the code to be timed\n", "setup = \"\"\"\\\n", "list_1 = [i**2 for i in range(1, 101)]\n", "list_2 = [i * (i+1) for i in range(1, 101)]\n", "\"\"\"\n", "\n", "# Compute 7 times the time it takes to compute the sum of lists 10000 times.\n", "# Then take the lowest value, which is the most reproducible statistic.\n", "nloop = 10000\n", "cputime = min(Timer(stmt, setup).repeat(7, nloop))/nloop\n", "\n", "print(\"The above calculation took {:.2f} microseconds.\".format(cputime*1e6))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's do the same with the *NumPy* arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# The code to be timed\n", "stmt = \"vec_1 + vec_2\"\n", "\n", "# Input needed to run the code to be timed\n", "setup = \"\"\"\\\n", "import numpy as np\n", "vec_1 = np.array([i**2 for i in range(1, 101)])\n", "vec_2 = np.array([i * (i+1) for i in range(1, 101)])\n", "\"\"\"\n", "\n", "cputime = min(Timer(stmt, setup).repeat(7, nloop))/nloop\n", "\n", "print(\"The above calculation took {:.2f} microseconds.\".format(cputime*1e6))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The exact duration may differ, but the latter code block should be appreciably faster (by about a factor of 10). This difference in efficiency is even more pronounced when working with large and multidimensional arrays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 17.2. Creating arrays: Vectors and matrices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several ways to create arrays. The first one, which we saw above, is simply to recast a regular *Python* list using the `np.array(list)` command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# Creating a 1D array\n", "vec = np.array([1, 1, 2, 3, 5, 8, 13, 21])\n", "print(vec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that, if you generate *NumPy* arrays in this way, you should include the outer `[]` denoting the list! By using nested lists, we can create multidimensional arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# Creating a 2D array\n", "matrix_2D = np.array([[1, 1, 2, 3], [5, 8, 13, 21]])\n", "print(matrix_2D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the same way, you can also create matrices with 3 or more dimensions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A second way to generate *NumPy* (1D) arrays is the `arange` function. It is very similar to the *Python* built-in `range` function, using the same arguments (`start`, `stop`, `increment`), with two differences. First, the `arange` function returns a `ndarray` instead of a list, and second, `arange` also accepts float arguments:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.arange(10, 30 ,5))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.arange(0, 2, 0.3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A third way to generate *NumPy* (1D) arrays is very similar. However, instead of providing the `increment`, the third argument in the `linspace` function is the number of elements we want to obtain (also, the range does now include `stop`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.linspace(0, 2, 9))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A fourth way to generate *NumPy* arrays is through the different matrix templates it provides, such as:\n", "* `np.zeros(shape)`: creates an array of shape `shape` (a tuple) containing zeros (as floating point numbers)\n", "* `np.ones(shape)`: creates an array of shape `shape` (a tuple) containing ones (as floating point numbers)\n", "* `np.eye(dim)`: creates the identity matrix of dimension `dim` (an integer)\n", "* `np.random.random(shape)`: creates an array of shape `shape` (a tuple) containing random floating numbers in the range [0,1[" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.zeros((2, 5)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.ones((2, 5)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.eye(3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.random.random((2, 5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, for completeness: it is also possible to make a (hard or deep) copy of the contents of an array to a new array using the `ndarray.copy()` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# Initialize 2x5 matrix containing ones\n", "a = np.ones((2, 5))\n", "\n", "# Make a hard copy and store it in b\n", "b = a.copy()\n", "\n", "# Change an element to 4\n", "b[1, 2] = 4\n", "print(b)\n", "print()\n", "\n", "# Meanwhile, the matrix a remains unchanged\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 17.3. Basic array operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While array operations may be more intuitive than list operations, there is one caveat: arithmetic operators on *NumPy* arrays apply **elementwise**. This is very intuitive for summing and subtracting arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "a = np.arange(0, 4, 2)\n", "b = np.arange(4, 8, 2)\n", "\n", "print(a)\n", "print(b)\n", "print(b-a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it is probably not what you expect if you try to multiply two matrices:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "A = np.array([[1, 1], [0, 1]])\n", "B = np.array([[2, 0], [3, 4]])\n", "print(A)\n", "print(B)\n", "print()\n", "print(A * B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want a matrix product instead of an element-wise product, you can use the `np.dot` function. Since *Python 3.5*, you can also use the `@` operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(np.dot(A, B))\n", "print(A @ B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To fulfill your basic statistics needs, *NumPy* also has some interesting functions, such as `max`, `min`, `mean`, `std`, `sum`, `var`. We refer to [this page](http://docs.scipy.org/doc/numpy-dev/user/quickstart.html#functions-and-methods-overview) for all arithmetic functions provided by *NumPy*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 17.4. Indexing, slicing, and reshaping arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything we've covered on indexing and slicing *Python* lists can also be used to index and slice *NumPy* arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "fib = np.array([1, 1, 2, 3, 5, 8, 13, 21, 34])\n", "\n", "# Print out the fourth element:\n", "print(fib[3])\n", "\n", "# Print out the fourth to the seventh element:\n", "print(fib[3:7])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, *NumPy* arrays can also be indexed using **masks**: boolean arrays with the same shape as the original array, containing `True` for elements that need to be kept and `False` for elements that need to be discarded:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# Create an array from 0 to 10\n", "a = np.arange(11)\n", "print(a)\n", "\n", "# Create a boolean array of the same shape stating\n", "# whether the element is even\n", "b = (a % 2 == 0)\n", "print(b)\n", "print()\n", "\n", "# Index the array a based on the mask b\n", "print(a[b])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned before, an array has a shape given by the number of elements along each axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "a = np.floor(10 * np.random.random((3, 4)))\n", "print(a)\n", "print(a.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The shape of an array can be changed with various commands. Note that the following three commands all return a modified array, but do not change the original array:\n", "\n", "* `array.ravel()`: returns the array, flattened (as a 1D array)\n", "* `array.reshape(new_shape)`: returns the array in the new shape (the new shape should contain as many elements as the old one)\n", "* `array.T`: returns the transpose of the original array" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(a.ravel())\n", "print()\n", "print(a.reshape(2, 6))\n", "print()\n", "print(a.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a dimension is given as `-1` in a reshaping operation, the other dimensions are automatically calculated so that the reshaped matrix contains as many elements as the original one. Of course, you can use `-1` only once per `reshape` call." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "print(a.reshape(2, -1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chapter 18: *Matplotlib*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`matplotlib.pyplot`, the only *Matplotlib* module we will discuss here, is a collection of command style functions that make *Matplotlib* work like the *MATLAB* plotting functions. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. \n", "\n", "In `matplotlib.pyplot` various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current axes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we start, let's import the module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 18.1. A first few examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What better way to introduce *matplotlib* than through an example?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "plt.plot([1, 2, 3, 4])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot` is a versatile command, and will take an arbitrary number of arguments. To make a plot of `y` as a function of `x`, you simply provide two lists or arrays:\n", "\n", " plot(x, y)\n", "\n", "For every `x`, `y` pair of arguments, there is an optional third argument, which is the **format string** that indicates the color and line type of the plot. The letters and symbols of the format string are borrowed from *MATLAB*, and you concatenate a color string with a line style string. The default format string is `\"b-\"`, which is a solid blue line. For example, to plot with red circles, you would issue:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To plot multiple graphs in one plot, you can simply add arguments to the `plot` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# evenly sampled time at 200ms intervals\n", "t = np.arange(0., 5., 0.2)\n", "\n", "# red dashes, blue squares and green triangles\n", "plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 18.2. Controlling line and graph properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lines have many attributes that you can set, such as linewidth, dash style, antialiased. A complete list of line attributes can be found [here](http://matplotlib.org/api/lines_api.html#matplotlib.lines.Line2D).\n", "\n", "In the graphs above, we did not yet name our axis (or our graph), nor did we see how we could clip axis. In the following example, we introduce both a `xlabel` and `ylabel`, a graph `title`, and clip our `y` axis to stop at 60, to better visualize the lower two graphs. Also note that we introduced a bit of LaTeX in the label of the y-axis by putting `r` in front of the string." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "# evenly sampled time at 200ms intervals\n", "t = np.arange(0., 5., 0.2)\n", "\n", "# red dashes, blue squares and green triangles\n", "plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')\n", "plt.xlabel(\"Time [s]\")\n", "plt.ylabel(r\"A comprehensive label, with $\\sigma_i = 15$\")\n", "plt.title(\"A first labeled graph!\")\n", "plt.ylim([0, 60])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 18.3. Working with multiple figures and axes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you want to plot multiple figures on a given page, for instance to compare them, the `subplot` command is indispensable. To set the stage for a page containing `numrows` rows and `numcols` columns of graphs, the following syntax can be used:\n", "\n", " plt.subplot(numrows, numcols, fignum)\n", "\n", "Here, `fignum` ranges from 1 to `numrows*numcols`. By calling this syntax, every subsequent plotting command will plot to the subfigure indicated by `fignum`. If all three numbers are smaller than 10, the commas may be omitted. Below, we use this functionality to visualize the effect of exponentially damping a cosine function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { }, "outputs": [], "source": [ "def f(t):\n", " return np.exp(-t) * np.cos(2*np.pi*t)\n", "\n", "t1 = np.arange(0.0, 5.0, 0.1)\n", "t2 = np.arange(0.0, 5.0, 0.02)\n", "\n", "plt.subplot(2, 1, 1)\n", "plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.plot(t2, np.cos(2*np.pi*t2), 'r--')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 18.4. Histograms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above, we've seen the basics of line plotting in *Matplotlib*. However, also histograms can be plotted easily using the `plt.hist()` function. This function is introduced below to plot a basic histogram:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Fixing random state for reproducibility\n", "np.random.seed(19680801)\n", "\n", "mu, sigma = 100, 15\n", "x = mu + sigma * np.random.randn(10000)\n", "\n", "# the histogram of the data\n", "n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)\n", "\n", "\n", "plt.xlabel('Smarts')\n", "plt.ylabel('Probability')\n", "plt.title('Histogram of IQ')\n", "plt.text(60, .025, r'$\\mu=100,\\ \\sigma=15$')\n", "plt.axis([40, 160, 0, 0.03])\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chapter 19: Here be dragons!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To finish this last *Python* notebook, we suggest some topics you might want to dive into if you are interested." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 19.1. Scientific packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We covered the *NumPy* and *Matplotlib* packages for scientific arithmetics and visualizing, respectively. However, depending on your needs, many more packages, libraries, and compilers may be of interest:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* [Autograd](https://github.com/HIPS/autograd): a library to automatically differentiate *Python* and *NumPy* code.\n", "* [h5py](http://www.h5py.org/): a library to interface the HDF5 binary data format to *Python*.\n", "* [mpmath](http://mpmath.org/): a library for real and complex floating-point arithmetic with arbitrary precision.\n", "* [SciPy](http://www.scipy.org/): a library for mathematics, science, and engineering.\n", "* [SymPy](http://www.sympy.org/en/index.html): a library for symbolic mathematics.\n", "* [Theano](http://deeplearning.net/software/theano/): a library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.\n", "* [Cython](http://cython.org/): a compiler to easily write *C* extensions for *Python*.\n", "* [Numba](http://numba.pydata.org/): a compiler for *Python* array and numerical functions that gives you the power to speed up your applications with high-performance functions written directly in *Python*.\n", "\n", "For a more extensive list, please check [this page](http://scipy.org/topical-software.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 19.2. Code checkers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To automatically check whether your *Python* code is written and formatted correctly, code checkers may be very useful. A few of them are mentioned here:\n", "\n", "* [Pylint](http://www.pylint.org/): a tool that checks for errors in Python code, tries to enforce a coding standard and looks for code smells.\n", "* [pycodestyle](http://pypi.python.org/pypi/pycodestyle): a tool to check your *Python* code against some of the style conventions in PEP 8.\n", "* [pydocstyle](http://pypi.python.org/pypi/pydocstyle): a tool to check compliance with *Python* docstring conventions.\n", "\n", "\n", "In this regard, it is also very important to consider **unit testing** for any piece of code you write. A unit test is an automated code-level test for a small \"unit\" of functionality. Unit tests are often designed to test a broad range of the expected functionality, including any weird corner cases and some tests that should not work. They tend to interact minimally with external resources like the disk, the network, and databases; testing code that accesses these resources is usually put under functional tests, regression tests, or integration tests. A detailed explanation on how to start using **nosetests** for unit testing can be found [here](http://ivory.idyll.org/articles/nose-intro.html). Note that, while these unit tests are typically defined on a small part of the code, the complete set of unit tests should completely cover the software. Without a **complete coverage**, these unit tests give no guarantee whatever that your code will work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 19.3. Collaborating on software using *Git*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When working with different people on a piece of software, a platform that keeps track of the different contributions and suggested changes can come in handy. [GitHub](http://github.com/) is such a code hosting platform for version control, built on the *Git* software. There is a free tutorial on this software on [Codecademy](http://www.codecademy.com/learn/learn-git) which takes about 1 hour and learns you the ropes of collaborating with *Git*. We recommend you to learn about this if you want to develop your own code, or extend existing codes. All CMM software packages are available on *GitHub* to promote contributing to them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Congratulations, you reached [the end](http://www.youtube.com/watch?v=DeumyOzKqgI&feature=youtu.be&t=33) of this four-part series on *Python*! While we hope that we've given you a basic idea of how this programming language can greatly help you throughout your master dissertation, you will, without a doubt, learn much more of its functionalities by using it. We also hope you've enjoyed it, and once again, congratulations on learning the [2017 top programming language](http://spectrum.ieee.org/computing/software/the-2017-top-programming-languages)!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }