Array Operations & Attributes Lesson

NumPy Indexing & NumPy Slicing

23 min to complete · By Gilad Gressel

NumPy Indexing & NumPy Slicing

In this lesson, you will learn how to index and slice NumPy arrays. You will also learn how to use boolean indexing to select elements from NumPy arrays that satisfy certain conditions. We will start with the basics of NumPy array indexing and NumPy array slicing and then move on to more advanced techniques.

NumPy Array Indexing

NumPy arrays are zero-indexed, the same as Python lists. This means that the index of the first element is 0, and the index of the last element is n-1, where n is the number of elements in the array. You can use square brackets to index a NumPy array, just like you would with a Python list. Further, NumPy follows the same start:stop:step convention for slicing arrays. Below are some examples of NumPy array indexing:

import numpy as np
a = np.arange(10)
print(a)
print(a[5]) # 5th element
[0 1 2 3 4 5 6 7 8 9]
5
a[:5] # first five elements
array([0, 1, 2, 3, 4])
a[5:] # elements after index 5
array([5, 6, 7, 8, 9])
a[::2] # every other element
array([0, 2, 4, 6, 8])
a[::-1] # reversed array
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

Assignment in NumPy Arrays

When you want to assign a value to a specific element in a NumPy array, you can use indexing to select the element. Then, you can use the assignment operator (=) to assign the value. Below is an example of how to use NumPy indexing to assign a value.

print(a)
a[5:8] = 12 # assign value to slice
print(a)
[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4 12 12 12  8  9]

Note: NumPy Arrays are Mutable!

a = np.arange(10)
b =a
b[0] = 99
print(a)
print(b)
[99  1  2  3  4  5  6  7  8  9]
[99  1  2  3  4  5  6  7  8  9]

More Advanced Assignments

In NumPy, you can assign whole ranges quite easily:

a[:5] = np.arange(10,15)
a
array([10, 11, 12, 13, 14,  5,  6,  7,  8,  9])
a[:5] += 100 # you can also perform operations on subsections of your array
a
array([110, 111, 112, 113, 114,   5,   6,   7,   8,   9])

Multi-Dimensional Arrays with NumPy Slicing

You can slice a NumPy array in multiple dimensions by separating the indices with a comma:

array[row_start:row_stop:row_step, column_start:column_stop:column_step]

Basically, we do the same thing as before, but now we have two dimensions to worry about because they are separated by a comma. Let's see an example of NumPy slicing for multi-dimensional arrays:

# 2D array, we will learn about reshape soon, I promise
a = np.arange(9).reshape((3,3)) 
print(a)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
print(a[0]) # first row, this is a shortcut for a[0,:]

array([0, 1, 2])
print(a[0])
print(a[0,:]) # these are equivalent 
# The second method is more explicit; it says, 
# "give me the first row and all the columns"
[0 1 2]
[0 1 2]
print(a[0,0]) # first element of first row
0
print(a[1,1]) # second element of second row
4
print(a[:,0]) # first column, there is no shorthand for this
# this is saying, "give me all the rows and the first column"
array([0, 3, 6])
print(a[:2,:2]) # first two rows and first two columns
array([[0, 1],
       [3, 4]])
print(a[:2,1:]) # first two rows and last two columns
array([[1, 2],
       [4, 5]])
# You can do all the same steps, strides etc., with multi-dimensional arrays
print(a[::2,::2]) # every other row and every other column
array([[0, 2],
       [6, 8]])

Boolean NumPy Indexing

Most of the time, we don't want to meticulously type out the exact coordinates we want from a matrix. In fact, most of the time, we don't know the coordinates and we may want to select elements based on some condition. For example, we may want to select all the elements in a matrix that are greater than 5. This is where boolean indexing comes in.

a = np.arange(100).reshape((10,10)) # a bigger matrix
print(a)
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
print(a[a>50]) # all elements greater than 50
array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
      68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
      85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Boolean Indexing Broken Down

So what happened in this example? It's actually a two-step process. First, we create a boolean array, and then we use that boolean array to index the original array. Let's see an example.

mask = a>50  #this is the boolean array
a[mask]  #this is the boolean indexing
mask = a>50
print(mask) #this is a boolean array, every element of a is compared to 50 and returns True or False
array([[False, False, False, False, False, False, False, False, False,
        False],
      [False, False, False, False, False, False, False, False, False,
        False],
      [False, False, False, False, False, False, False, False, False,
        False],
      [False, False, False, False, False, False, False, False, False,
        False],
      [False, False, False, False, False, False, False, False, False,
        False],
      [False,  True,  True,  True,  True,  True,  True,  True,  True,
        True],
      [ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True],
      [ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True],
      [ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True],
      [ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True]])
# we can then use the boolean array to select 
# only the True elements of a (or any array of our choosing)

b = a[mask] # all elements greater than 50
print(b)
array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
b[0] = 99 # change the first element of b
print(a) 
# a is unchanged because boolean indexing (masking) returns 
# a new 1D array, a copy of the original array, NOT a view.
print()
print(b)
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]
    
[99 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
 99]

Combining Boolean Conditions

In order to combine boolean conditions, you need to use the & and | operators. Let's see an example.

print(a)
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
b = a[(a<90) and (a>50)] # this will not work, you need to use the bitwise operator & instead of the boolean operator and
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[27], line 1
----> 1 b = a[(a<90) and (a>50)] # this will not work. You need to use the bitwise operator & instead of the boolean operator and


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
b = a[(a<90) & (a>50)] # this will, since it uses &
print(b)
array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89])
b = a[(a>90) | (a<10)] # this will, since it uses | (bitwise OR)
print(b)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 91, 92, 93, 94, 95, 96, 97,
       98, 99])

Using Integer Arrays For NumPy Indexing

You can also use integer arrays for indexing. Let's see an example.

x = np.array([10, 20, 30, 40, 50])
print(x)
# selects the 1st, 3rd, and 4th elements
indices = np.array([1, 3, 4]) 
print(x[indices])  # Outputs: [20 40 50]
[10 20 30 40 50]
[20 40 50]
# you do not have to put things in order - you can put them in whatever order you want!
indices = np.array([4, 3, 1]) # selects the 4th, 3rd, and 1st elements
print(x[indices])  # Outputs: [50 40 20]
[50 40 20]
y = np.array([[1, 2], [3, 4], [5, 6]])
print(y)
cols = np.array([1, 0, 1])
print()

print(y[:, cols])  
# here we are saying, "give me all the rows, but only the 
# columns in the cols array," which is in order [1, 0, 1], so give 
# me the 1 column, then the 0th, then the 1 again
[[1 2]
 [3 4]
 [5 6]]
    
[[2 1 2]
 [4 3 4]
 [6 5 6]]
Colorful illustration of a light bulb

Integer Array Indexing Is Confusing So, integer array indexing examples are often very confusing. Just look above! So, just know that it exists.

Common Use Case for Boolean Indexing

Oftentimes, you would like to take the index of one array and use it to select the same elements of another array. In these cases, you will end up using boolean indexing more often than not.

Let's imagine that we have two matrices. One is some statistics about students in our school; the other is their grade average.

# 12 random grades between 60 and 100
grades = np.random.randint(60, 101, 12) 
# 12 random stats between 0 and 100, reshaped into 3 columns
random_stats = np.random.randint(0, 100, 36).reshape(12,3) 
print(grades)
print()
print(random_stats) 
# you can imagine the stats columns are things like 
# "hours studied", "hours slept", "hours played video games"
[85 77 68 68 65 76 96 78 67 76 62 64]
    
[[ 5 18 22]
 [24 34 66]
 [ 4 95 57]
 [96 10 93]
 [26 78 73]
 [38 62 78]
 [ 9  4 87]
 [96 57 66]
 [ 1 24 94]
 [ 4 69 86]
 [57 76 55]
 [40 24 56]]
# Now we want to select the stats for the 
# students who got a B (80 or above)
# We can use boolean indexing to do this
# First, we need to create a boolean array that is 
# True for all the grades that are 90 or above
mask = grades >= 80
print(mask)
print()
# Now we can use the mask to select only the 
# rows of random_stats where the mask is True
print(random_stats[mask])

# These kinds of operations will happen a lot in 
# data science, so it is important to understand how they work. 
[ True False False False False False  True False False False False False]
    
[[ 5 18 22]
 [ 9  4 87]]

Summary: NumPy Indexing and NumPy Slicing

  • NumPy arrays can be indexed and sliced using standard Python syntax, including arr[index] for single elements, arr[start:stop:step] for slicing, and multi-dimensional indexing with comma-separated indices (e.g., arr[row, col]).

  • Boolean indexing allows selecting elements from arrays based on conditions, using boolean arrays created with logical operations (e.g., mask = arr > 5) and then indexing with the boolean array (arr[mask]).

  • Integer arrays can also be used for indexing, providing a way to select specific elements from one array based on the indices in another array (e.g., x = arr[indices]).

  • Common use cases for NumPy indexing and NumPy slicing include selecting subsets of data based on conditions, combining data from multiple arrays, and restructuring or rearranging array elements, which are fundamental operations in data analysis and scientific computing.