Array Operations & Attributes Lesson

NumPy View vs. NumPy Copy

8 min to complete · By Gilad Gressel

NumPy View vs. NumPy Copy

NumPy arrays are mutable, which we saw in a previous lesson.

The trick to working with mutable objects is to know when you are working with a NumPy view and when you are working with a NumPy copy. If you are working with a view, then you run the risk of side effects. If you are working with a copy, then you are safe.

This lesson will help you determine whether you are working with a view vs. a copy.

Why Return a View?

The first question you may wonder is, why return a view? The answer is that it is much more efficient to return a view than to return a copy in terms of memory. This is because the data is not copied. Instead, the view is just a different way of looking at the same data in memory. Oftentimes, in data science, we are just "looking around," and then views are great because we are just examining things. We don't need to copy the data.

When Do We Get a NumPy View?

Basic indexing will create a view. Basic indexing is when we use the square brackets to index into an array. For example, a[0] is basic indexing.

import numpy as np
a = np.arange(12)
print(a[5:8])
b=a[5:8]
print(b)
[5 6 7]
[5 6 7]
# now b is a view of the data in a, so if 
# we update b, it will update the data in a
b[0]=50
print(b)
print(a)
[50  6  7]
[ 0  1  2  3  4 50  6  7  8  9 10 11]
print(b.base) 
# when we print the base of b, 
# we see it is the original array a
[ 0  1  2  3  4 50  6  7  8  9 10 11]
print(a.base) 
# this will be None, 
# since a is the original array
None

What Returns a NumPy Copy?

Most complex slicing and selections will return a copy.

c = a[a>5]
print(c)
[50  6  7  8  9 10 11]
c[0]=100 
# this will not update a because c is a copy of a
print(c)
print(a)
[100   6   7   8   9  10  11]
[ 0  1  2  3  4 50  6  7  8  9 10 11]
print(c.base) 
# this will be None since c is a copy of a
None
# multi dimensional arrays and slicing will return copies
a = np.arange(12).reshape(3,4)
print(a)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
b = a[0:2, 0:2]
print(b)
print()
# multiply all elements of b by 2
b=b*2 
print(b)
print()
# a is not updated since b is a copy of a
print(a) 
[[0 1]
 [4 5]]
    
[[ 0  2]
 [ 8 10]]
    
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
print(b.base)
None

The Basic Issue with NumPy Views

The basic issue is that NumPy uses views to save us memory, which we like. However, the rules for when things return views vs copies can be ambiguous. For example, you will see this phrase a lot in the documentation:

The XXXX function creates a view where possible or a copy otherwise.

This can be challenging for us.

How to Make Sure You Return a NumPy Copy

So if you are not sure, you can check with .base, or if you want to make sure you have a copy, you can use .copy().

a = np.arange(12)
print(a[5:8])
# b will be a view of a, but we can force it to copy
b=a[5:8].copy() 
print(b)
print(b.base) # will be None
[5 6 7]
 [5 6 7]
None

The Docs

Summary: NumPy View vs. NumPy Copy

  • NumPy arrays are mutable, and operations on arrays can create either views (which share the same data in memory as the original array) or copies (which are entirely new arrays with their own data in memory).
  • Basic indexing operations like arr[start:stop] typically create views, while more complex operations like arr[arr > 5] or multi-dimensional slicing arr[rows, cols] often create copies.
  • NumPy views are memory-efficient but can lead to unintended side effects if the original array is modified through the view.
  • The .base attribute can check if an array is a view (arr.base returns the original array), and .copy() can force creating a copy.
  • Understanding the distinction between a NumPy view and NumPy copy is crucial for avoiding bugs and managing memory usage efficiently when working with large NumPy arrays in data analysis and scientific computing.