7.2. NumPy arrays#

Now that we know what packages are and why NumPy in particular is very useful to nanobiologists, let’s take a deeper look into NumPy. In this section, we will introduce NumPy arrays, a data structure introduced in NumPy for working with vectors and matrices. We will first explore how to create and manipulate them.

7.2.1. Creating NumPy arrays#

Until now, the variable types we have been working with in Python are int, float, complex, bool, str, list, tuple, and dict.

If we want to work with vectors, you may think that list could come in handy as a vector-like data type. However, unlike vectors in mathematics, list cannot be multiplied, subtracted, etc. Therefore, we will introduce a new data type - NumPy array - which is handy for nanobiology calculations and comes from the NumPy package (as the name suggests).

NumPy arrays are a way to work in Python with not just single numbers, but a whole bunch of numbers. With NumPy arrays, these numbers can be manipulated just like in mathematics when you work with vectors. For example, a (column) vector \(\vec{a}\) with four elements might look like this:

\[\begin{split} \vec{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} \end{split}\]

We will use NumPy arrays extensively in Python as vectors, like above, but also for storing, manipulating, and analyzing datasets (like a column of an Excel spreadsheet).

There are multiple functions in NumPy that allow us to create a NumPy array in Python. Let’s look at some of them.

7.2.1.1. np.array()#

A simple way to create a NumPy array is to use the function np.array(). It makes an array from a comma-separated list of numbers in square brackets.

# Remember that we need to import NumPy if we want to use it
import numpy as np

# Create a NumPy array with 4 elements
a = np.array([1, 2, 3, 4])
print(a)

Python has a convenient function len() to check the size of your array:

print(len(a))

From len() we learned that our one-dimensional array \(a\) has size of 4. Note that NumPy does not make a distinction between row vectors and column vectors: there are just vectors.

# Your code here

7.2.1.2. np.zeros(), np.ones(), np.eye()#

If you want to specifically create an array of only ones, only zeros, or an array of zeros with ones on a diagonal, you can use np.zeros(), np.ones(), and np.eye(). Try it out! If you’re unsure how to use these functions, try to search for help with ?.

# Your code here

7.2.1.3. np.linspace()#

To automatically generate an array with linearly increasing values, you can use np.linspace(). This function takes three arguments: the starting number, the ending number, and the number of points.

This is a bit like the range() function we saw before, but allows you to pick the total number of points, automatically calculating the (non-integer) step size you need.

a = np.linspace(0,20,40)
print(a)
print()
print("Length is: ", len(a))

Note that even though we made an array of 40 numbers, you may have expected that a step number for 40 points between 0 and 20 will be 0.5. If we wanted to have a step size of exactly 0.5, we actually need a total of 41 points:

a = np.linspace(0,20,41)
print(a)
print()
print("Length is: ", len(a))
# Your code here

7.2.1.4. np.arange()#

If we want to have more control over the exact spacing of our points, we can use the np.arange() function. This function is like range(), asking you for the start, stop, and step size.

a = np.arange(0,20,0.5)
print(a)
print()
print("Length is: ", len(a))

Here, we already see a small quirk of np.arange(): it will stop once the next point it calculates is < (not <=) to the stop point. If we want to get a range that stops at 20.0, we need to make the stop point a bit bigger than 20 (but smaller than our step size):

a = np.arange(0,20.00000001,0.5)
print(a)
print()
print("Length is: ", len(a))

For this reason, we do not use np.arange() very often, and mostly use np.linspace(). There are also several other useful functions, such as np.geomspace(), which produces geometrically spaced points (such that they are evenly spaced on a log scale).

# Your code here

7.2.1.5. np.random.random()#

NumPy can also generate arrays of random numbers.

a = np.random.random(40)
print(a)

This will generate uniform random numbers in the range of 0 to 1, but there are also several other random number generator functions that can make normally distributed random numbers, random integers, and more.

7.2.2. Multidimensional arrays#

NumPy also supports two-dimensional (and generally \(N\)-dimensional) arrays. When talking about 2D arrays, you would typically think of matrices. Note that you can also think of column and row vectors as matrices, where either row or column number is equal to 1. Sometimes it will be important to work with specifically row or column vectors (we see such examples in linear algebra).

How do we create 2D arrays in Python? Look at these examples of column and row vectors first, and pay close attention to the use of square brackets:

row_vector = np.array([[1, 2, 3, 4]])
print(row_vector)
print(np.shape(row_vector))
column_vector = np.array([[1], [2], [3], [4]])
print(column_vector)
print(np.shape(column_vector))

Note that for arrays with the dimension higher than one we have to use np.shape() to check their size (len() doesn’t work for multidimensional arrays). In this example, our \(row\_vector\) has dimensions \((4,1)\), meaning that it has 4 rows and 1 column, and the other way around for \(column\_vector\).

Important

In the array dimensions notation, the first number always denotes the number of rows and the second number is the number of columns:
\((rows, columns)\)
For example, \((5,8)\) denotes a 2D array with 5 rows and 8 columns.

Each element in an array has its own index, e.g., element \((1,2)\) is the element in the first row and second column. Important to remember: Python counts from zero, so you must take into account that row and column indices also start from zero.

Besides row and column vectors, we can also define matrices using np.array(). Let’s define a \((2,3)\) array and check its dimensions with np.shape().

d = np.array([[3,2,1],[4,5,6]])
print(d)
np.shape(d)
import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions1.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[15], line 1
----> 1 import micropip
      2 await micropip.install("jupyterquiz")
      3 from jupyterquiz import display_quiz

ModuleNotFoundError: No module named 'micropip'

Note that you can also use other functions for defining multidimensional arrays, such as np.zeros():

m = np.zeros([5,5])
print(m)
import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions2.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)

7.2.3. Manipulating arrays#

In some cases we want to add items to our array or combine different arrays into a single one. There are different ways to do that. To combine two NumPy arrays, we can make use of the np.concatenate() function. To add items at the end of our array, we can use the np.append() function.

a = np.array([1, 2, 3, 4, 5])
b = np.array([6, 7, 8, 9, 10])
c = np.concatenate((a, b))
c = np.append(c, 11)
print(c)
even_nr = np.array([0, 2, 4, 6])
print(even_nr)

odd_nr = np.array([1, 3, 5, 7])
print(odd_nr)

# Add your code

7.2.4. Mathematical operations on NumPy arrays#

As we hinted in previous sections, the advantage of using NumPy arrays for scientific computing is the way they behave under mathematical operations. In particular, they very often do exactly what we would want them to do if they were a vector:

a = np.array([1,2,3,4,5])
print(2*a)
print(a+a)
print(a+1)
print(a-a)
print(a/2)
print(a**2)

What about if we multiply two vectors together?

In mathematics, if we multiply two vectors, what we get depends on whether we use the “dot product” or the “outer product” for the multiplication (see here). The dot product corresponds to multiplying a row vector by a column vector to produce a single number. The outer product (also called the tensor product) corresponds to multiplying the column vector by the row vector to make a matrix.

The dot product of two vectors \(\mathbf{a}\) and \(\mathbf{b}\), where:

\[\begin{split} \mathbf{a} = \begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix} \quad \text{and} \quad \mathbf{b} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} \end{split}\]

is given by:

\[ \mathbf{a} \cdot \mathbf{b} = a_1 b_1 + a_2 b_2 + a_3 b_3 \]

For example, let \(\mathbf{a} = \begin{bmatrix} 2 & 3 & 4 \end{bmatrix}\) and \(\mathbf{b} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}\).

Then:

\[ \mathbf{a} \cdot \mathbf{b} = (2 \times 1) + (3 \times 0) + (4 \times -1) = 2 + 0 - 4 = -2 \]

The outer product of two vectors \(\mathbf{a}\) and \(\mathbf{b}\), where:

\[\begin{split} \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} \quad \text{and} \quad \mathbf{b} = \begin{bmatrix} b_1 & b_2 \end{bmatrix} \end{split}\]

produces a matrix \(\mathbf{a} \otimes \mathbf{b}\) as follows:

\[\begin{split} \mathbf{a} \otimes \mathbf{b} = \begin{bmatrix} a_1 b_1 & a_1 b_2 \\ a_2 b_1 & a_2 b_2 \\ a_3 b_1 & a_3 b_2 \end{bmatrix} \end{split}\]

For example, let \(\mathbf{a} = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}\) and \(\mathbf{b} = \begin{bmatrix} 1 & -1 \end{bmatrix}\).

Then:

\[\begin{split} \mathbf{a} \otimes \mathbf{b} = \begin{bmatrix} 2 \times 1 & 2 \times -1 \\ 3 \times 1 & 3 \times -1 \\ 4 \times 1 & 4 \times -1 \end{bmatrix} = \begin{bmatrix} 2 & -2 \\ 3 & -3 \\ 4 & -4 \end{bmatrix} \end{split}\]

Question: If we type a*a, or more generally a*b, does Python use the inner or outer product?

It turns out: it uses neither! In Python, the notation a*a produces what is commonly called the “element-wise” product, specifically:

a*b = [a[0]*b[0] a[1]*b[1] a[2]*b[2] ...]

We can see this here:

a = np.array([1,2,3,4,5])
print(a*a)

What if we actually want the dot product or the outer product? For that, Python has functions np.dot() and np.outer():

print(np.dot(a,a))
print(np.outer(a,a))

Pretty much all operators work with NumPy arrays, even comparison operators, which can sometimes be very handy:

print(a)
print(a>3)
# Your code here

Python also has many other useful functions for performing calculations using arrays:

# Calculate the standard deviation
print(np.std(a))
# The square root
print(np.sqrt(a))
# Numpy also has max and min, here is an example of min
a = np.linspace(-10,10,100)
print(np.min(a**2))

A question for you to think about: why is this not zero? And what would we have to change to get the code to return zero?

In addition to finding the minimum value with np.min(), the function np.argmin() can tell you where (what index number) the minimum is.

import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions5.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)

Other NumPy functions

There is an array (pun intended) of available NumPy functions besides the ones shown here, and we’re listing some of them just to give you an idea:

  • np.transpose() - transpose an array

  • np.sqrt(), np.exp(), np.log() - element-wise square root, exponential, and natural logarithm

  • np.sum(), np.mean(), np.median(), np.std() - calculate sum, mean, median, and standard deviation of elements

  • np.min(), np.max() - find minimum and maximum value

  • np.where() - return indices of elements that meet a condition

  • np.nonzero() - return indices on non-zero elements