Skip to content

Commit f23f019

Browse files
authored
Assignment
0 parents  commit f23f019

10 files changed

Lines changed: 315 additions & 0 deletions

README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Homework 5 - Advanced NumPy
2+
3+
## Deadline
4+
5+
The deadline for this homework is on **Monday, 17th of May (2021-05-17 00:00:00 UTC+2)**.
6+
7+
## New Homework System
8+
9+
In response to the feedback we got through the *Homework System Survey* on StudIP, it is now only necessary to **solve 2/3 of the subtasks** in a homework order to **pass** a homework. However, we still highly encourage you to try to solve all tasks whenever possible. Solving a problem on your own is the only way you can really know if you have understood a concept.
10+
11+
This change comes with a partly new set of commands that you can use:
12+
13+
* Run `pytest test_[FILE-TO-TEST].py` to see if you have passed a subtask (e.g. `pytest test_omitting_outliers.py`)
14+
* Run `pytest` to see whether you have passed all subtasks - **this still works, but is not what determines the checkmark any more**
15+
* Run `python pass_check.py` to check how many points you currently have (one subtask will usually give 10 points) - **this is what determines the checkmark now**
16+
17+
**The checkmark next to your commit on GitHub is still the deciding factor in whether you pass a homework or not.**
18+
19+
If you encounter any problems with the new homework system, first check the forum. If it seems to be an unknown issue, please write to [Martin](mailto:mpoemsl@uos.de) so that it can be fixed or make a public post in the forum if it is a question whose answer might be of interest to others.
20+
21+
## This Homework
22+
23+
This homework is about working with more advanced NumPy concepts. You will solve three tasks, one each in the following files:
24+
25+
* Omitting Outliers: `omitting_outliers.py`
26+
* Rigorous Ranking: `rigorous_ranking.py`
27+
* Printing Primes: `printing_primes.py`
28+
29+
Template function headers are given in the files. You may use the full functionality of NumPy, but **no other libraries**.
30+
31+
As always, you pass by pushing your solution to GitHub and having the green checkmark appear next to your commit. You can also ask our Telegram bot: `@uos_scipy_bot`. Running `python pass_check.py` on your own machine should also give the same result, but it's not what counts in the end!
32+
33+
### 1 Omitting Outliers
34+
35+
A common necessity in scientific programming is the removal of outliers, i.e. data points that were likely created by mistake. Write a function `remove_outliers` that takes in an n-dimensional (numeric) array `data` as a positional parameter, as well as keyword parameters `m` (default: `3`) and `replace` (default: `True`).
36+
37+
Your function should detect every value as an outlier that is further than `m * sigma` away from the mean of `data`, where `sigma` is the standard deviation of `data`.
38+
39+
- If `replace` is `True`, the function should return an array of the same shape as `data`, where every outlier is replaced with the mean of the original input array.
40+
41+
- If `replace` is `False`, the function should return a 1D array containing all values of the input array except for the outliers.
42+
43+
**Note:** For the purposes of this task, assume that mean and standard deviation may be calculated once in advance, even though at that point the outliers are still present in the dataset. There exist better methods for detecting outliers, but those are trickier to implement and beyond the scope of this task.
44+
45+
**Functions that may be helpful:** `np.mean`, `np.std`, `np.abs`, `np.ndarray.copy`
46+
47+
### 2 Rigorous Ranking
48+
49+
Write a function `compute_rank` that takes a numeric array `array` of any shape as input and computes the **rank** of each value.
50+
51+
The **rank** is an integer between 0 and `array.size - 1`. It signifies the position of each element in a *sorted* version of the array. For example: `[20, 30, -10]` should map to `[1, 2, 0]`.
52+
53+
In the test examples, there will be no arrays that include ties. However, a good stance to take would be to give the element that comes first in the original array the lower rank.
54+
55+
**Hint:** The solution to this assignment can potentially be very short, but that doesn't mean it's straightforward. This task will be easier if you have a good intuition for indexing with integer arrays. Additionally, it may be a good idea to think about 1D arrays first and then adapt your solution for more dimensions.
56+
57+
**Functions that may be helpful:** `np.argsort`, `np.arange`, `np.flatten`
58+
59+
### 3 Printing Primes
60+
61+
For this task, you are given two helper functions that are already implemented:
62+
63+
- `is_prime(n)` will return True if `n` is a prime number, `False` otherwise
64+
- `pretty_print_bool_array(array)` will print a boolean array to the console such that `True` is printed as "x" and `False` is printed as "."
65+
66+
**Step 1:** Create a new function `is_prime_numpy`, that works just like `is_prime` but will accept an entire NumPy array as input and return the result element-wise.
67+
68+
**Step 2:** Write a function `print_primes(rows, cols)`. Internally, it should create a boolean array of shape `(rows, cols)`. This array should be `True` exactly where the index is a prime number.
69+
70+
How is this supposed to work with a 2D array, you ask? We have a row and a column index after all. What I mean is that you should imagine the values of the array being enumerated from top-left to bottom-right for this task, as in the example below:
71+
72+
| **0** | **1** | **2** | **3** | **4** |
73+
|:--:|:--:|:--:|:--:|:--:|
74+
| **5** | **6** | **7** | **8** | **9** |
75+
| **10** | **11** | **12** | **13** | **14** |
76+
| **15** | **16** | **17** | **18** | **19** |
77+
78+
It is up to you to find a suitable way to convert from `(row, col)` to this number.
79+
80+
`print_primes` should then print this boolean array using the given pretty-print function. There should be no return value.
81+
82+
**Functions that may be helpful:** `np.vectorize` and `np.indices` (only discussed in the Jupyter notebook)

helpers.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
import numpy as np
2+
3+
import types
4+
5+
def is_prime(n):
6+
"""simple function to check if a given integer is prime"""
7+
# no change necessary here
8+
9+
if n <= 1:
10+
return False
11+
12+
if n > 2 and n % 2 == 0:
13+
return False
14+
15+
for i in range(3, int(np.sqrt(n)) + 1, 2):
16+
if n % i == 0:
17+
return False
18+
19+
return True
20+
21+
22+
def pretty_print_bool_array(array):
23+
"""this function will print a boolean array such that True values are 'x'
24+
and False values are '.'"""
25+
26+
with np.printoptions(formatter={"bool": lambda b: "x" if b else "."}):
27+
print(array)
28+
29+
30+
def imports_of_your_file(filename, testfile):
31+
""" Yields all imports in the testfile. """
32+
33+
for name, val in vars(testfile).items():
34+
if isinstance(val, types.ModuleType):
35+
# get direct imports
36+
yield val.__name__
37+
38+
else:
39+
# get from x import y imports
40+
imprt = getattr(testfile, name)
41+
42+
if hasattr(imprt, "__module__") and not str(imprt.__module__).startswith("_") and not str(imprt.__module__) == filename:
43+
yield imprt.__module__
44+

omitting_outliers.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import numpy as np
2+
3+
#def remove_outliers(data, m=3, replace=True):
4+
# implement task 1 here
5+
#raise NotImplementedError
6+
7+
def remove_outliers(an_array, m=3, replace=True):
8+
mean = np.mean(an_array)
9+
sigma = np.std(an_array)
10+
distance_from_mean = abs(an_array - mean)
11+
not_outlier = distance_from_mean < m * sigma
12+
if (replace) :
13+
return mean
14+
else:
15+
return an_array[not_outlier]
16+
17+
18+
if __name__ == "__main__":
19+
# use this for your own testing!
20+
21+
data = np.array([10, 10, 10, 17, 10, 10])
22+
print(remove_outliers(data, m=3, replace=False))
23+
24+
25+

pass_check.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
from test_omitting_outliers import test_remove_outliers
2+
from test_rigorous_ranking import test_compute_rank
3+
from test_printing_primes import test_print_primes
4+
5+
import pytest
6+
7+
SUBTASKS = {
8+
"Omitting Outliers": (test_remove_outliers, 10),
9+
"Rigorous Ranking": (test_compute_rank, 10),
10+
"Printing Primes": (test_print_primes, 10)
11+
}
12+
13+
def test_entire_homework():
14+
15+
points_achieved = 0
16+
points_possible = 0
17+
18+
print("Checking {} subtasks to see if you reached at least 2/3 of the possible points ...\n".format(len(SUBTASKS)))
19+
20+
for subtask_name, (subtask_test_function, subtask_points) in SUBTASKS.items():
21+
22+
points_possible += subtask_points
23+
24+
try:
25+
26+
subtask_test_function()
27+
28+
# this line will only be reached if no AssertionError or other Exception was thrown in test_function()
29+
points_achieved += subtask_points
30+
31+
print("Subtask '{}' with {} points: All good here!".format(subtask_name, subtask_points))
32+
33+
except Exception:
34+
35+
print("Subtask '{}' with {} points: Something went wrong. Better run 'pytest' on the corresponding pytest file to learn more!".format(subtask_name, subtask_points))
36+
37+
if points_achieved >= (2/3) * points_possible:
38+
39+
print("\nCongratulations! You passed this homework with {} out of {} points.".format(points_achieved, points_possible))
40+
print("Do not forget to commit and push the current state of your code - the pass is only official once you see the checkmark on GitHub!")
41+
exit(0)
42+
43+
else:
44+
45+
print("\nSorry! You only achieved {} points which is less than 2/3 of the possible {} points.".format(points_achieved, points_possible))
46+
exit(1)
47+
48+
if __name__ == "__main__":
49+
test_entire_homework()

printing_primes.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
from helpers import is_prime, pretty_print_bool_array
2+
3+
import numpy as np
4+
5+
# implement step 1 of task 3: printing primes here
6+
is_prime_numpy = None
7+
8+
def print_primes(rows, cols):
9+
# implement step 2 of task 3: prining primes here
10+
raise NotImplementedError

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
pytest>=4
2+
numpy>=1.20

rigorous_ranking.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import numpy as np
2+
3+
def compute_rank(array):
4+
5+
# implement task 2: rigorous ranking here
6+
raise NotImplementedError
7+

test_omitting_outliers.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from helpers import imports_of_your_file
2+
3+
from hashlib import sha1
4+
5+
import numpy as np
6+
7+
try:
8+
import omitting_outliers as testfile
9+
except ModuleNotFoundError:
10+
assert False, "The name of your file is supposed to be 'omitting_outliers.py'!"
11+
12+
13+
def test_remove_outliers(filename="omitting_outliers", allowed_imports={"numpy"}):
14+
data = np.array([[-2, 1, 1, 0, 2, -1, 1, -3, -3, -2, -3, -3,-42, 1, 2, 1, 0, 1, -1, -1],
15+
[-2, -3, 2, 2, -3, 42, 0, 2, 0, -1, -3, 1, 2, 1, -3, 0, 0, -2, 0, 1]])
16+
17+
data = data.astype(float)
18+
19+
result = testfile.remove_outliers(data)
20+
21+
assert result.shape == data.shape, "If replace is True, your function should preserve the shape of the array"
22+
assert sha1(result).hexdigest() == "52ffe05d09dbe4135e1a1a16e113a1a578775a56", "Your function does not seem to return the correct result"
23+
24+
result = testfile.remove_outliers(data, m=4.4)
25+
26+
assert sha1(result).hexdigest() == "15e350705c1a1d2b9a8cb16b0cf05b2266a9a2fd", "Your function does not seem to return the correct result"
27+
28+
result = testfile.remove_outliers(data, replace=False)
29+
30+
assert result.shape == (38, ), "If replace is False, your function should reduce the number of elements"
31+
assert sha1(result).hexdigest() == "c51ef6b5de6a9654dcf2d4f69de324eb2591cac6", "Your function does not seem to return the correct result"
32+
33+
assert set(imports_of_your_file(filename, testfile)) <= allowed_imports, "You are not allowed to import any modules except NumPy!"
34+

test_printing_primes.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
from helpers import imports_of_your_file
2+
3+
from hashlib import sha1
4+
from io import StringIO
5+
6+
import numpy as np
7+
8+
import sys
9+
10+
try:
11+
import printing_primes as testfile
12+
except ModuleNotFoundError:
13+
assert False, "The name of your file is supposed to be 'printing_primes.py'!"
14+
15+
16+
def test_print_primes(filename="printing_primes", allowed_imports={"numpy", "helpers"}):
17+
18+
assert callable(testfile.is_prime_numpy), "Your script does not have a function 'is_prime_numpy'"
19+
20+
try:
21+
testfile.is_prime_numpy(np.arange(3))
22+
except ValueError:
23+
assert False, "Your 'is_prime_numpy' function does not seem to be able to handle NumPy arrays"
24+
25+
test_stdout = StringIO()
26+
saved_stdout = sys.stdout
27+
28+
sys.stdout = test_stdout
29+
30+
result = testfile.print_primes(20, 20)
31+
32+
sys.stdout = saved_stdout
33+
34+
s = test_stdout.getvalue()
35+
36+
assert result is None, "Your function 'print_primes' should not have a return value"
37+
assert len(s) > 0, "Your function 'print_primes' should *print* its result!"
38+
assert sha1(s.encode("utf-8")).hexdigest() == "3c49338521ff33568de33bdeb89ae5b83a5b6c45", "Your function 'print_primes' does not seem to print the correct result"
39+
40+
assert set(imports_of_your_file(filename, testfile)) <= allowed_imports, "You are not allowed to import any modules except NumPy and helpers.py!"
41+

test_rigorous_ranking.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from helpers import imports_of_your_file
2+
3+
import numpy as np
4+
5+
try:
6+
import rigorous_ranking as testfile
7+
except ModuleNotFoundError:
8+
assert False, "The name of your file is supposed to be 'rigorous_ranking.py'!"
9+
10+
11+
def test_compute_rank(filename="rigorous_ranking", allowed_imports={"numpy"}):
12+
13+
data = np.random.choice(100, size=(5, 5), replace=False)
14+
rank = testfile.compute_rank(data)
15+
16+
assert rank.shape == data.shape, "The returned array does not have the same shape as the input array"
17+
assert np.array_equal(np.sort(np.unique(rank)), np.arange(rank.size)), "The returned array contains invalid values for a rank"
18+
assert np.array_equal(data.argsort(), rank.argsort()), "The returned array does not contain the correct ranks"
19+
20+
assert set(imports_of_your_file(filename, testfile)) <= allowed_imports, "You are not allowed to import any modules except NumPy!"
21+

0 commit comments

Comments
 (0)