From 7179ea712993c253ca752b625c534f4a54c1772c Mon Sep 17 00:00:00 2001 From: pymilo Date: Mon, 29 Jan 2018 14:38:20 -0800 Subject: [PATCH] something --- .../hw_1_assignment-checkpoint.ipynb | 574 ++++++++++++++++++ 1 file changed, 574 insertions(+) create mode 100644 Homeworks/hw_1/.ipynb_checkpoints/hw_1_assignment-checkpoint.ipynb diff --git a/Homeworks/hw_1/.ipynb_checkpoints/hw_1_assignment-checkpoint.ipynb b/Homeworks/hw_1/.ipynb_checkpoints/hw_1_assignment-checkpoint.ipynb new file mode 100644 index 0000000..961c6a6 --- /dev/null +++ b/Homeworks/hw_1/.ipynb_checkpoints/hw_1_assignment-checkpoint.ipynb @@ -0,0 +1,574 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Homework 1: Numpy, Scipy, Pandas\n", + "\n", + "### Due Friday Sept 9, 2016 @ 9am\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## #0: Get set up with your environment to work on and submit homework\n", + "\n", + "a. Create a new homework repository at github\n", + "\n", + "\n", + "\n", + "Name your repo something sensible (e.g. python-ay250-homeworks). Given your Berkeley affiliation you should be able to get private repos if you'd like.\n", + "\n", + "\n", + "\n", + "\n", + "b. Clone this repo locally and make a directory for this week's homework:\n", + "\n", + "```bash\n", + "cd /class/directories ## this will be different on your machine...whereever you want to keep your work.\n", + "\n", + "# change to your github username:\n", + "git clone https://github.com//python-ay250-homework.git\n", + "cd python-ay250-homework\n", + "mkdir hw_1\n", + "echo \"hw_1 README\" >> hw_1/README.md\n", + "git add hw_1/README.md\n", + "git commit hw_1/README.md -m \"added hw_1 directory\"\n", + "git push\n", + "```\n", + "\n", + "c. Copy this notebook into your `hw_1` folder from a local version of the python-seminar repo\n", + "\n", + "```bash\n", + "cd /class/directories\n", + "git clone https://github.com/profjsb/python-seminar.git \n", + "cd python-seminar\n", + "git pull\n", + "cp Homeworks/hw_1/* /class/directories/python-ay250-homework/hw_1/\n", + "```\n", + "\n", + "d. Get working! Be sure to check in your work as often as you'd like\n", + "\n", + "```bash\n", + "cd /class/directories/python-ay250-homework\n", + "git add hw_1/\n", + "git commit -m \"this is a check in\"\n", + "```\n", + "\n", + "e. To submit your work, send us (prof+GSIs) your github handle and repo name for us to clone (you'll need to add us to the repo if you've made a private one)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## #1: Super-resolution imaging \n", + "\n", + "Obtaining several snapshots of the same scene, from microscopes to telescopes, is useful for the postprocessing increase of signal to noise: by summing up imaging data we can effectively beat down the noise. Interestingly, if we image the same scene from different vistas we can also improve the clarity of the combined image. Being able to discern features in a scene from this combination effort is sometimes called super-resolution imaging.\n", + "\n", + "Here, we'll combine about 4 seconds of a shaky video to reveal the statement on a license plate that is not discernable in any one frame.\n", + "\n", + "\n", + "\n", + "A tarball of the data is at: https://drive.google.com/open?id=0B4vIeCR-xYNnbXFJTTVlVnpUZkk\n", + "\n", + "```bash\n", + "tar -xvzf homework1_data.tgz # do NOT check this files into git...\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 1** Read in each image into a `numpy` array. Resize each frame to be 3 times larger in each axis (ie. 9 times larger images). Using `scipy.signal.fftconvolve` find the offsets of each frame with respect to the first frame. Report those offsets to 2 decimal places. \n", + "\n", + " - Hint1: you'll need to figure out how to resize a numpy array\n", + " - Hint2: you'll want to reverse the second image when doing the convolution: `scipy.signal.fftconvolve(im1, im2[::-1, ::-1])`\n", + " - Hint3: you'll need to figure out how to identify the peak of the fft convolution to find the offsets between images" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 2** Shift each image to register the frames to the original (expanded in size) frame. You should, in general, be shifting by subpixel offsets. You might want to look at `scipy.ndimage.interpolation.shift`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 3** Combine all the registered images to form a super-resolution image. What does the license plate read?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# #2: An elementary introduction to spectral audio compression\n", + "\n", + "In this problem, we'll explore the very basics of audio compression in the spectral domain using numpy and scipy. We'll do a bit of visualization with matplotlib, but since that is covered later in the course, we'll provide those functions for you.\n", + "\n", + "Audio compression is a large and complex topic, and the design of a format for compressed audio such as the popular [MP3](http://en.wikipedia.org/wiki/MP3) is too complex to cover in detail here. However, we will introduce the basic tools that most such compression formats use, namely:\n", + "\n", + "1. Converting the input signal to the frequency domain by taking a Fast Fourier Transform (FFT).\n", + "\n", + "2. Dropping information in the frequency domain, resulting in a smaller amount of data.\n", + "\n", + "3. Reconstructing back the signal in the time domain from this smaller representation of the signal.\n", + "\n", + "Steps 1 and 2 above are the 'encoding' part of signal compression, and step 3 is the 'decoding' part. For this reason, the tools that perform these steps are typically referred to as signal 'codecs', short for encoders/decoders.\n", + "\n", + "Note that here we say 'signal': while MP3 is an audio format, the same ideas apply to the compression of digital images with formats such as JPEG and video. Virtually all multimedia technologies we use today, from audio players to cell phones, digital cameras and YouTubeVideo, are based on sophisticated extensions and applications of these simple ideas." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's first load the plotting tools and importing some tools we'll need later:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "%pylab inline\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# we'll need some path manipulations later on\n", + "import os" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We define a simple utility function to listen to audio files right in the browser:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def Audio(fname):\n", + " \"\"\"Provide a player widget for an audio file.\n", + " \n", + " Parameters\n", + " ==========\n", + " fname : string\n", + " Filename to be played.\n", + " \n", + " Warning\n", + " =======\n", + " \n", + " Browsers cache audio very aggressively. If you change an\n", + " audio file on disk and are trying to listen to the new version, you \n", + " may want to \n", + " \"\"\"\n", + " from IPython.display import HTML, display\n", + " \n", + " # Find out file extension and deduce MIME type for audio format\n", + " ext = os.path.splitext(fname)[1].replace('.', '').lower()\n", + " mimetype = 'audio/' + ('mpeg' if ext == 'mp3' else ext)\n", + " \n", + " tpl = \"\"\"

{fname}:

\n", + "\n", + "\"\"\"\n", + " display(HTML(tpl.format(**locals())))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We also define a convenience wrapper around `plt.specgram`, [matplotlib's spectrogram function](http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.specgram), with a colorbar and control over the color limits displayed. This will make it easier to compare across different signals with the same colors for all inputs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def specgram_cbar(x, title=None, clim=(0, 80) ):\n", + " \"\"\"Plot spectrogram with a colorbar and range normalization.\n", + " \n", + " Call matplotlib's specgram function, with a custom figure size, \n", + " automatic colobar, title and custom color limits to ease \n", + " comparison across multiple figures.\n", + " \n", + " Parameters\n", + " ==========\n", + " x : array\n", + " One-dimensional array whose spectrogram should be plotted.\n", + " \n", + " title : string\n", + " Optional title for the figure.\n", + " \n", + " clim : 2-tuple\n", + " Range for the color limits plotted in the spectrogram.\n", + " \"\"\"\n", + " f = plt.figure(figsize=(10,3))\n", + " plt.specgram(x)\n", + " plt.colorbar()\n", + " plt.clim(*clim)\n", + " if title is not None:\n", + " plt.title(title)\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 1**: Use the `Audio` function above to listen to the signal we will be experimenting with, a simple voice recording stored in the file `Homeworks/hw1/data/voice.wav`.\n", + "\n", + "Note: if your browser doesn't support audio, you may try a different browser. We've tested current versions of Chrome and Firefox, and it works OK with both." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 2**: Write a function to compress a 1-d signal by dropping a fraction of its spectrum. \n", + "\n", + "You can drop the smallest components by setting their values to zero.\n", + "\n", + "*Hints*: \n", + "\n", + "- look at the `np.fft` module, keeping in mind that your input signal is real.\n", + "- look at the `argsort` method of numpy arrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def compress_signal(x, fraction):\n", + " \"\"\"Compress an input signal by dropping a fraction of its spectrum.\n", + " \n", + " Parameters\n", + " ==========\n", + " x : array\n", + " 1-d real array to be compressed\n", + " \n", + " fraction : float\n", + " A number in the [0,1] range indicating which fraction of the spectrum\n", + " of x should be zeroed out (1 means zero out the entire signal).\n", + " \n", + " Returns\n", + " =======\n", + " x_approx : array\n", + " 1-d real array reconstructed after having compressed the input.\n", + " \"\"\"\n", + " # your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As a quick visual check (not that this is *not* a formal test of correctness), experiment with a simple random signal by changing the compression ratio and plotting both the signal and the compressed version:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "x = np.random.rand(128)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "fraction = 0.6 # play changing this in the 0-1 range\n", + "\n", + "xa = compress_signal(x, fraction)\n", + "\n", + "plt.figure(figsize=(12,3))\n", + "plt.plot(x, alpha=0.5, lw=2, label='original')\n", + "plt.plot(xa, lw=2, label='compressed {0:.0%}'.format(fraction))\n", + "plt.legend();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 3**: Write a function that will compress an audio file by a dropping a fraction of its spectrum, writing the output to a new file.\n", + "\n", + "If the input file is named `a.wav` and the compression fraction is 0.9, the output file should be named `a_comp_0.9.wav`.\n", + "\n", + "*Hints:* \n", + "\n", + "- look at the `scipy.io` module for routines dealing with files in `wav` format.\n", + "\n", + "- you may need to use the `astype` method of numpy arrays to get the correct data type for `wav` files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def compress_wav(fname, fraction):\n", + " \"\"\"Compress an audio signal stored in an input wav file.\n", + " \n", + " The compressed signal is returned as a numpy array and automatically written \n", + " to disk to a new wav file.\n", + " \n", + " Parameters\n", + " ==========\n", + " fname : string\n", + " Name of the input wav file\n", + " \n", + " fraction : float\n", + " Fraction of input data to keep.\n", + " \n", + " Returns\n", + " =======\n", + " rate : int\n", + " Bit rate of the input signal.\n", + "\n", + " x : array\n", + " Raw data of the original input signal.\n", + " \n", + " x_approx : array\n", + " Raw data of the compressed signal.\n", + " \n", + " new_fname : string\n", + " Auto-generated filename of the compressed signal.\n", + " \"\"\"\n", + " \n", + " # your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 4**: Study the effect of compressing the input file at different ratios: 0.1, 0.5, 0.75, 0.9, 0.95, 0.99.\n", + "\n", + "Using the `OrderedDict` class from the [Python collections module](http://docs.python.org/2/library/collections.html#collections.OrderedDict), store the uncompressed signal as well as the compressed array and filename for each compression ratio.\n", + "\n", + "You will create an `OrderedDict` called `voices`, with:\n", + "\n", + "- keys: compression ratios\n", + "- values: pairs of (x, filename) where x is the compressed audio and filename is the name of the compressed file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 5**: Loop over the `voices` dict, and for each one generate an audio player as well as a spectrogram. Observe how the spectrogram changes, and listen to each file. At what ratio do you stop understanding the recording?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# #3: Armchair Astronomer\n", + "\n", + "Often times, people act as good sensors of the physical universe. We can use Google Trends data to help us determine some fundamental parameters of the Solar System.\n", + "\n", + "**Problem 1**: Using just the CSV file we created in the pandas lecture (`merged_data.csv`) and some frequency analysis tools in `scipy` to determine:\n", + "\n", + " - the number of days in a year\n", + " - the period of the moon's orbit around the Earth\n", + " \n", + "Hint: `from scipy.signal.spectral import lombscargle`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# #4: Reproducing some insights about the Election\n", + "\n", + "Nate (\"not a genius, just a Bayesian\") Silver writes often about polls and their utility of predicting elections. One of the things he emphasized during the 2016 campaign is that even \"large\" polls of people with a consistent lead for one candidate will show wild swings in any given window in time.\n", + "\n", + "**Problem 1**: Using Pandas and `numpy`, try to reproduce this plot from a Nate Silve Tweet qualitatively using the same assumptions.\n", + "\n", + "\n", + "\n", + "https://twitter.com/NateSilver538/status/769565612955824128" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 2**: Clearly, even with a 6% point lead, there's a chance that this sort of poll would show the other person in the lead. How much would ahead (in percent) would a candidate need to be to have a tracking poll never show the other candidate to be ahead over the course of a year (in your simulation)?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Problem 3**: With a 3 and 6% lead, how many people would need to be polled in 1 day to have the rolling 5-day poll result always show the leader ahead (over a year)?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your code here" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}