{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Vectorization\n", "\n", "```{admonition} Interactive page\n", ":class: warning, dropdown\n", "This is an interactive book page. Press launch button at the top right side.\n", "```\n", "\n", "We can use `for` loops to iterate through NumPy arrays and perform calculations. For example, this code will calculate the average value of all the numbers in an array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "a = np.linspace(1, 20, 20)\n", "avg = 0\n", "\n", "for x in a:\n", " avg += x\n", "\n", "avg /= len(a)\n", "\n", "print(\"Average is\", avg)" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-471d12979c1baf8e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "```{admonition} /= operator\n", ":class: note, dropdown\n", "The `/=` operator is an **in-place division operator**. It is used to divide the value of a variable by another value and then assign the result back to the original variable, i.e., it is a short notation for `variable = variable / value`.\n", "```\n", "\n", "Because calculating an average is a common operation, there is a function built into NumPy, `np.average()` (as you may expect after [having seen](../chapter7/numpy-arrays.ipynb) `np.std()` and similar functions). Remember, the entire purpose of packages is that you don't need to reinvent commonly used code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(\"Average is\", np.average(a))" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-48cd73f9aeaf481e", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "This is very handy: it saves us loads of typing. From the function name, it is also easy to understand what it's doing, making the code cleaner and easier to read. However, the purpose of NumPy functions is not only to save us lots of typing: they also can often **perform calculations much faster than if you code the calculation yourself with a `for` loop**.\n", "\n", "To show this quantitatively, we will use the `time` library to calculate the time it takes to find the average of a pretty big array using both techniques (a `for` loop and `np.average()`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The time() function from the time library will return a floating point number representing \n", "# the number of seconds since January 1, 1970, 00:00:00 (UTC), with millisecond or even microsecond\n", "# precision\n", "# \n", "# We will use this to make a note of the starting time and the ending time, \n", "# and then print out the time difference \n", "from time import time\n", "\n", "# A pretty big array, 50 million random numbers\n", "a = np.random.random(int(50e6))\n", "\n", "# Set timer\n", "t1 = time()\n", "\n", "# Calculate the average using a for loop\n", "avg = 0\n", "for x in a:\n", " avg += x \n", "avg /= len(a)\n", "t2 = time()\n", "t_forloop = t2-t1\n", "print(\"The 'for' loop took %.3f seconds\" % (t2-t1))\n", "\n", "# Calculate the average using np.average\n", "t1 = time()\n", "avg = np.average(a)\n", "t2 = time()\n", "t_np = t2-t1\n", "print(\"Numpy took %.3f seconds\" % (t2-t1))\n", "\n", "# Now let's compare the two methods\n", "print(\"\\nNumpy was %.1f times faster!\" % (t_forloop/t_np))" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-bbb6d85899ba86b1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Why is NumPy so much faster?** The reason is that Python is an interpreted language, as we've seen [earlier](../chapter3/python-characteristics.ipynb). In each of the steps of the `for` loop, the Python kernel reads in the next step it has to do, translates that into an instruction for your computer processor, asks the computer to perform the step, gets the result back, reads in the next step, translates that into a processor instruction, sends that as an instruction to the computer processor, etc. \n", "\n", "If we did the same test in a compiled programing language like C, there would be no difference if we used a library function or if we wrote our own `for` loop. \n", "\n", "**When you use smart functions in Python libraries**, like (many of) those in NumPy, NumPy will actually **use an external library compiled in a language like C or Fortran** that is able to send all of the calculation in **one step** to your computer processor, and in one step, get all the data back. This makes Python nearly as fast as a compiled language like C or Fortran, as long as you are smart in how you use it and **avoid having \"manual\" `for` loops for large or long calculations**. \n", "\n", "Note: for **small calculations**, Python coded `for` loops are perfectly fine and very handy!\n", "\n", "In the language of interpreted programmers, finding smart ways of getting what you need done using compiled library functions is often referred to as [**vectorization**](https://en.wikipedia.org/wiki/Array_programming). \n", "\n", "Note that even normal mathematical operators are actually vectorized functions when they operate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# This is actually a vectorized 'for' loop, it involves multiplying 50 million numbers by 5\n", "b = 5*a" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-2a60c21baccb3c7d", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ " Here is a nice example of a vectorized way of counting the number of times the number 5 occurs in a random sample of 100 integers between 0 and 20:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "nums = np.random.randint(0, 21, 100)\n", "print(\"There are %d fives\" % np.sum(nums == 5))" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-c8bc80849f5e0e0a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "To see how this works, we can look at the intermediate steps:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "nums = np.random.randint(0, 21, 100)\n", "print(nums)\n", "print(nums == 5)\n", "print(\"There are %d fives\" % np.sum(nums == 5))" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-1a3f30004c3e3ffb", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Note that in this case, `np.sum()` will convert the `bool` value `True` into `1` and `False` into `0` for calculating the sum, according to the standard conversion of `bool` types to `int` types. You can see this in action using the function `astype()` that is built into NumPy arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(nums == 5)\n", "print((nums == 5).astype('int'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "thebe-remove-input-init" ] }, "outputs": [], "source": [ "import micropip\n", "await micropip.install(\"jupyterquiz\")\n", "from jupyterquiz import display_quiz\n", "import json\n", "\n", "with open(\"questions6.json\", \"r\") as file:\n", " questions=json.load(file)\n", " \n", "display_quiz(questions, border_radius=0)" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }