{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NumPy arrays\n", "\n", "```{admonition} Interactive page\n", ":class: warning, dropdown\n", "This is an interactive book page. Press launch button at the top right side.\n", "```\n", "\n", "Now that we know what packages are and why NumPy in particular is very useful to nanobiologists, let's take a deeper look into NumPy.\n", "In this section, we will introduce NumPy arrays, a data structure introduced in NumPy for working with vectors and matrices. We will first explore how to create and manipulate them. \n", "\n", "## Creating NumPy arrays\n", "\n", "Until now, the variable types we have been working with in Python are `int`, `float`, `complex`, `bool`, `str`, `list`, `tuple`, and `dict`.\n", "\n", "If we want to work with vectors, you may think that `list` could come in handy as a vector-like data type. However, unlike vectors in mathematics, `list` cannot be multiplied, subtracted, etc.\n", "Therefore, we will introduce a new data type - **NumPy array** - which is handy for nanobiology calculations and comes from the NumPy package (as the name suggests).\n", "\n", "NumPy arrays are a way to work in Python with not just single numbers, but a whole bunch of numbers. With NumPy arrays, these numbers can be manipulated just like in mathematics when you work with vectors. For example, a (column) vector $\\vec{a}$ with four elements might look like this:\n", "\n", "$$\n", "\\vec{a} = \\begin{bmatrix}\n", "1 \\\\\n", "2 \\\\\n", "3 \\\\\n", "4\n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use NumPy arrays extensively in Python as vectors, like above, but also for storing, manipulating, and analyzing datasets (like a column of an Excel spreadsheet). \n", "\n", "There are multiple functions in NumPy that allow us to create a NumPy array in Python. Let's look at some of them.\n", "\n", "\n", "### np.array()\n", "\n", "A simple way to create a NumPy array is to use the function `np.array()`. It makes an array from a comma-separated list of numbers in square brackets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Remember that we need to import NumPy if we want to use it\n", "import numpy as np\n", "\n", "# Create a NumPy array with 4 elements\n", "a = np.array([1, 2, 3, 4])\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a convenient function `len()` to check the **size of your array**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(len(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From `len()` we learned that our one-dimensional array $a$ has size of 4. Note that **NumPy does not make a distinction between row vectors and column vectors: there are just vectors**.\n", "\n", "```{exercise}\n", ":class: dropdown\n", "Make: \n", "1. a `list`, and \n", "2. a 1D NumPy array \n", "\n", "in which the values 2, 3, 5 are stored. Multiply the list and the array by 2 and print the outcomes. What is the difference between the mathematical operation on a `list` and on a NumPy array?\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.zeros(), np.ones(), np.eye()\n", "\n", "If you want to specifically create an array of only ones, only zeros, or an array of zeros with ones on a diagonal, you can use `np.zeros()`, `np.ones()`, and `np.eye()`. Try it out! If you're unsure how to use these functions, try to search for help with `?`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.linspace()\n", "\n", "To automatically generate an array with linearly increasing values, you can use `np.linspace()`.\n", "This function [takes three arguments](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html): the starting number, the ending number, and the number of points.\n", "\n", "This is a bit like the `range()` function we [saw before](../chapter6/loops.ipynb), but allows you to pick the total number of points, automatically calculating the (non-integer) step size you need." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.linspace(0,20,40)\n", "print(a)\n", "print()\n", "print(\"Length is: \", len(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that even though we made an array of 40 numbers, you may have expected that a step number for 40 points between 0 and 20 will be 0.5. If we wanted to have a step size of exactly 0.5, we actually need a total of 41 points:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.linspace(0,20,41)\n", "print(a)\n", "print()\n", "print(\"Length is: \", len(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{exercise}\n", ":class: dropdown\n", "Generate an array that runs from -2 to 1 with 20 points using `np.linspace()`.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.arange()\n", "\n", "If we want to have more control over the exact spacing of our points, we can use the `np.arange()` function. [This function](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) is like `range()`, asking you for the start, stop, and step size." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.arange(0,20,0.5)\n", "print(a)\n", "print()\n", "print(\"Length is: \", len(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we already see a small quirk of `np.arange()`: it will stop once the next point it calculates is `<` (not `<=`) to the stop point. If we want to get a range that stops at `20.0`, we need to make the stop point a bit bigger than 20 (but smaller than our step size):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.arange(0,20.00000001,0.5)\n", "print(a)\n", "print()\n", "print(\"Length is: \", len(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this reason, we do not use `np.arange()` very often, and mostly use `np.linspace()`. There are also several other useful functions, such as `np.geomspace()`, which produces [geometrically spaced points](https://docs.scipy.org/doc/numpy/reference/generated/numpy.geomspace.html) (such that they are evenly spaced on a log scale). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{exercise}\n", ":class: dropdown\n", "Generate a NumPy array that has a first element with value 60 and last element 50 and takes steps of -0.5 between the values. \n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### np.random.random()\n", "\n", "NumPy can also generate arrays of random numbers." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.random.random(40)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will generate uniform random numbers in the range of 0 to 1, but there are also several other random number generator functions that can make [normally distributed](https://en.wikipedia.org/wiki/Normal_distribution) random numbers, random integers, and more." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multidimensional arrays\n", "\n", "NumPy also supports two-dimensional (and generally $N$-dimensional) arrays. When talking about 2D arrays, you would typically think of **matrices**. Note that you can also think of **column and row vectors** as matrices, where either row or column number is equal to 1. Sometimes it will be important to work with specifically row or column vectors (we see such examples in linear algebra). \n", "\n", "How do we create 2D arrays in Python? Look at these examples of column and row vectors first, and pay close attention to the use of square brackets:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "row_vector = np.array([[1, 2, 3, 4]])\n", "print(row_vector)\n", "print(np.shape(row_vector))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "column_vector = np.array([[1], [2], [3], [4]])\n", "print(column_vector)\n", "print(np.shape(column_vector))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that for arrays with the dimension higher than one we have to use `np.shape()` to check their size (`len()` doesn't work for multidimensional arrays).\n", "In this example, our $row\\_vector$ has dimensions $(4,1)$, meaning that it has 4 rows and 1 column, and the other way around for $column\\_vector$.\n", "\n", "```{important}\n", "In the array dimensions notation, the first number always denotes the number of rows and the second number is the number of columns:\\\n", "$(rows, columns)$ \\\n", "For example, $(5,8)$ denotes a 2D array with 5 rows and 8 columns.\n", "\n", "Each element in an array has its own **index**, e.g., element $(1,2)$ is the element in the first row and second column. \n", "Important to remember: **Python counts from zero**, so you must take into account that row and column indices also start from zero.\n", "```\n", "\n", "Besides row and column vectors, we can also define matrices using `np.array()`. Let's define a $(2,3)$ array and check its dimensions with `np.shape()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "d = np.array([[3,2,1],[4,5,6]])\n", "print(d)\n", "np.shape(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "thebe-remove-input-init" ] }, "outputs": [], "source": [ "import micropip\n", "await micropip.install(\"jupyterquiz\")\n", "from jupyterquiz import display_quiz\n", "import json\n", "\n", "with open(\"questions1.json\", \"r\") as file:\n", " questions=json.load(file)\n", " \n", "display_quiz(questions, border_radius=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you can also use other functions for defining multidimensional arrays, such as `np.zeros()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "m = np.zeros([5,5])\n", "print(m)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "thebe-remove-input-init" ] }, "outputs": [], "source": [ "import micropip\n", "await micropip.install(\"jupyterquiz\")\n", "from jupyterquiz import display_quiz\n", "import json\n", "\n", "with open(\"questions2.json\", \"r\") as file:\n", " questions=json.load(file)\n", " \n", "display_quiz(questions, border_radius=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manipulating arrays\n", "\n", "In some cases we want to add items to our array or combine different arrays into a single one. There are different ways to do that. To combine two NumPy arrays, we can make use of the `np.concatenate()` function. To add items at the end of our array, we can use the `np.append()` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.array([1, 2, 3, 4, 5])\n", "b = np.array([6, 7, 8, 9, 10])\n", "c = np.concatenate((a, b))\n", "c = np.append(c, 11)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{exercise}\n", ":class: dropdown\n", "Below we have defined two arrays, one with even numbers and one with odd numbers. Combine these two arrays in a single array and [sort](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) them.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "even_nr = np.array([0, 2, 4, 6])\n", "print(even_nr)\n", "\n", "odd_nr = np.array([1, 3, 5, 7])\n", "print(odd_nr)\n", "\n", "# Add your code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mathematical operations on NumPy arrays\n", "\n", "As we hinted in previous sections, the advantage of using NumPy arrays for scientific computing is the way they behave under mathematical operations. In particular, they very often do exactly what we would want them to do if they were a vector:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.array([1,2,3,4,5])\n", "print(2*a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a+a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a+1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a-a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a/2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a**2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about if we multiply two vectors together? \n", "\n", "In mathematics, if we multiply two vectors, what we get depends on whether we use the \"dot product\" or the \"outer product\" for the multiplication (see [here](https://en.wikipedia.org/wiki/Row_and_column_vectors#Operations)).\n", "The dot product corresponds to multiplying a row vector by a column vector to produce a **single number**. The outer product (also called the tensor product) corresponds to multiplying the column vector by the row vector to make a **matrix**. \n", "\n", "The **dot product** of two vectors $\\mathbf{a}$ and $\\mathbf{b}$, where:\n", "\n", "$$\n", "\\mathbf{a} = \\begin{bmatrix} a_1 & a_2 & a_3 \\end{bmatrix} \\quad \\text{and} \\quad \\mathbf{b} = \\begin{bmatrix} b_1 \\\\ b_2 \\\\ b_3 \\end{bmatrix}\n", "$$\n", "\n", "is given by:\n", "\n", "$$\n", "\\mathbf{a} \\cdot \\mathbf{b} = a_1 b_1 + a_2 b_2 + a_3 b_3\n", "$$\n", "\n", "For example, let $\\mathbf{a} = \\begin{bmatrix} 2 & 3 & 4 \\end{bmatrix}$ and $\\mathbf{b} = \\begin{bmatrix} 1 \\\\ 0 \\\\ -1 \\end{bmatrix}$.\n", "\n", "Then:\n", "\n", "$$\n", "\\mathbf{a} \\cdot \\mathbf{b} = (2 \\times 1) + (3 \\times 0) + (4 \\times -1) = 2 + 0 - 4 = -2\n", "$$\n", "\n", "\n", "The **outer product** of two vectors $\\mathbf{a}$ and $\\mathbf{b}$, where:\n", "\n", "$$\n", "\\mathbf{a} = \\begin{bmatrix} a_1 \\\\ a_2 \\\\ a_3 \\end{bmatrix} \\quad \\text{and} \\quad \\mathbf{b} = \\begin{bmatrix} b_1 & b_2 \\end{bmatrix}\n", "$$\n", "\n", "produces a matrix $\\mathbf{a} \\otimes \\mathbf{b}$ as follows:\n", "\n", "$$\n", "\\mathbf{a} \\otimes \\mathbf{b} = \\begin{bmatrix} a_1 b_1 & a_1 b_2 \\\\ a_2 b_1 & a_2 b_2 \\\\ a_3 b_1 & a_3 b_2 \\end{bmatrix}\n", "$$\n", "\n", "For example, let $\\mathbf{a} = \\begin{bmatrix} 2 \\\\ 3 \\\\ 4 \\end{bmatrix}$ and $\\mathbf{b} = \\begin{bmatrix} 1 & -1 \\end{bmatrix}$.\n", "\n", "Then:\n", "\n", "$$\n", "\\mathbf{a} \\otimes \\mathbf{b} = \\begin{bmatrix} 2 \\times 1 & 2 \\times -1 \\\\ 3 \\times 1 & 3 \\times -1 \\\\ 4 \\times 1 & 4 \\times -1 \\end{bmatrix} = \\begin{bmatrix} 2 & -2 \\\\ 3 & -3 \\\\ 4 & -4 \\end{bmatrix}\n", "$$\n", "\n", "\n", "\n", "**Question: If we type `a*a`, or more generally `a*b`, does Python use the inner or outer product?**\n", "\n", "It turns out: it uses **neither!** In Python, the notation `a*a` produces what is commonly called the \"element-wise\" product, specifically:\n", "\n", "```\n", "a*b = [a[0]*b[0] a[1]*b[1] a[2]*b[2] ...]\n", "```\n", "\n", "We can see this here:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.array([1,2,3,4,5])\n", "print(a*a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we actually want the dot product or the outer product? For that, Python has functions `np.dot()` and `np.outer()`: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(np.dot(a,a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(np.outer(a,a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty much all operators work with NumPy arrays, even **comparison operators**, which can sometimes be very handy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(a)\n", "print(a>3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{exercise}\n", ":class: dropdown\n", "Generate a sequence of the first 20 powers of 2 in a NumPy array (starting at $2^0$). \n", "Your output should be an array $[2^0, 2^1, 2^2, 2^3, ...]$. \n", "*(Hint: Start with a NumPy array created using an appropriate `range` function that makes an array [0,1,2,3,...].)*\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python also has many other useful functions for performing calculations using arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Calculate the standard deviation\n", "print(np.std(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# The square root\n", "print(np.sqrt(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Numpy also has max and min, here is an example of min\n", "a = np.linspace(-10,10,100)\n", "print(np.min(a**2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A question for you to think about: why is this not zero? And what would we have to change to get the code to return zero? \n", "\n", "In addition to finding the minimum value with `np.min()`, the function `np.argmin()` can tell you **where** (what index number) the minimum is." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "thebe-remove-input-init" ] }, "outputs": [], "source": [ "import micropip\n", "await micropip.install(\"jupyterquiz\")\n", "from jupyterquiz import display_quiz\n", "import json\n", "\n", "with open(\"questions5.json\", \"r\") as file:\n", " questions=json.load(file)\n", " \n", "display_quiz(questions, border_radius=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Other NumPy functions\n", ":class: note\n", "There is an array (pun intended) of available NumPy functions besides the ones shown here, and we're listing some of them just to give you an idea:\n", "- `np.transpose()` - transpose an array\n", "- `np.sqrt()`, `np.exp()`, `np.log()` - element-wise square root, exponential, and natural logarithm\n", "- `np.sum()`, `np.mean()`, `np.median()`, `np.std()` - calculate sum, mean, median, and standard deviation of elements\n", "- `np.min()`, `np.max()` - find minimum and maximum value \n", "- `np.where()` - return indices of elements that meet a condition\n", "- `np.nonzero()` - return indices on non-zero elements\n", "```" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }