{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Libraries, documentation, and NumPy\n", "\n", "```{admonition} Interactive page\n", ":class: warning, dropdown\n", "This is an interactive book page. Press launch button at the top right side.\n", "```\n", "\n", "NumPy (**Num**erical **Py**thon) is the fundamental **package for scientific computing** in Python.\n", "\n", "But what are Python packages, and why would we want to use NumPy in particular?\n", "\n", "## Python packages\n", "\n", "**Packages are building blocks of programming**.\n", "Packages consist of **reusable pieces of code** so that you do not have to program everything from scratch (and, thereby, you don't have to keep \"reinventing the wheel\"). \n", "Sometimes we refer to published packages as **libraries**.\n", "According to [Python packages](https://py-pkgs.org/01-introduction.html), there is over 350,000 Python packages (based on data from 2022), and the number keeps growing. This means that there is **a lot of code out there for you to reuse**!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Why use packages?\n", ":class: note, dropdown\n", "Imagine you are working with a vector of numbers and, using Python, you wish to calculate the average of the numbers in your vector.\n", "As you can imagine, many people before you have wanted to do the same thing in Python.\n", "\n", "To prevent programmers from reinventing the same pieces of code time and again, there are Python packages available, each containing pre-made code (such as code for calculating an average). Another good reason to use packages is computational efficiency - the code from the packages often runs much faster than the code you would write yourself (due to some Python features). More on this in a [later section on vectorization](vectorization.ipynb).\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "```{admonition} Useful packages\n", ":class: tip, dropdown\n", "Python packages often focus on a **specific problem or domain**. Sometimes several libraries offer options that will result in the same outcome, e.g., you can make an xy scatter plot with more than one library. \\\n", "Here we offer an overview of some widely used Python packages that will likely come handy in your nanobiology studies and research. Packages that we will use **in this book** are shown in bold.\n", "* Plotting and visualization\n", " * **Matplotlib**\n", " * **seaborn** \n", " * Plotly \n", "* Scientific computing\n", " * **NumPy**\n", " * SciPy\n", "* Bioinformatics\n", " * Biopython\n", "* Data analysis\n", " * **pandas**\n", "* Machine learning\n", " * Scikit-learn\n", " * PyTorch\n", "* Utilities and tools\n", " * os\n", " * re\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Naming conflicts\n", ":class: warning\n", "**DO NOT** use variable names that are the same as function names or package names. Examples are `list`, `str`, `print`, or `numpy`. This can **override the original meaning** and cause **unexpected errors**.\n", "\n", "Similarly, **AVOID** naming your **Python files** (e.g., `numpy.py`) after libraries or third-party packages. Doing so can **confuse the import system** and **prevent you from using the real package** in your code.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing a package\n", "\n", "When you want to use a package in your code, you have to tell that to Python explicitly. We call this **importing a package** into the *namespace* of your kernel. Namespace is a container that holds a collection of functions and variable names.\n", "\n", "You typically do this at the very beginning of your Python script by using a line such as this:\n", "\n", "```\n", "import numpy\n", "```\n", "\n", "If you have multiple packages to import, you also need several import lines one beneath the other.\n", "\n", "In order to use a package, you have to make sure it's actually **installed**. If you use Anaconda, it installs a lot of the packages and tools for you. In our case, we installed a more light-weight version called Miniconda (remember the [installation steps](../chapter2/installation.md)). We then [created a conda environment](../chapter2/setting-up-environment.md) from the file env.yaml containing all packages needed for this course. \n", "\n", "In case you want to install a package on your own, two popular tools to do that are `pip` and `conda`. In practice, installing a package (such as NumPy) is as simple as **opening your terminal and running one of these lines**:\n", "\n", "```\n", "conda install numpy\n", "```\n", "\n", "or \n", "\n", "```\n", "pip install numpy\n", "```\n", "\n", "```{admonition} Installation into your current environment\n", ":class: important\n", "Running `conda install numpy` installs NumPy into the currently active environment. If you want to install NumPy into a different environment, make sure to run `conda deactivate`, then `conda activate ` before running `conda install numpy`. \n", "```\n", "\n", "```{admonition} Windows vs. macOS/Linux\n", ":class: important\n", "When needed, run the package installation commands in your Bash terminal for macOS/Linux, and in Anaconda Prompt terminal for Windows.\n", "```\n", "\n", "If you're unsure **whether a Python package is installed**, you can also simply check it in your terminal using `pip list` or `conda list`. Check with `conda` - do you already have NumPy installed in `pf-env`?\n", "\n", "```{admonition} Pip vs. conda\n", ":class: note, dropdown\n", "Both `pip` and `conda` can be used to install packages, so what are the differences between them?\n", "There are [three differences](https://numpy.org/install/):\n", "1. `conda` is cross-language, meaning that it can also install non-Python libraries and tools, while `pip` is for Python only.\n", "2. `conda` installs from its own channels, while `pip` installs from the Python Packaging Index. The latter is the largest collection of packages, but all popular ones are also available with `conda`.\n", "3. `conda` is an integrated tool for managing packages, dependencies, and environments, while with `pip` you may need other tools for dealing with environments (see below) or complex dependencies.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Packages contain functions\n", "\n", "But what does a package actually look like, what's inside it?\n", "A package contains **functions**, with each function performing a specific **task**.\n", "A function typically also has a set of **arguments**, which allow you to pass information into the function to tailor what exactly it will do. \n", "Depending on a function, the arguments can be:\n", "- **mandatory** - you **have to** specify this for a function to work\n", "- **optional** - you **can** set this parameter yourself; if you don't, there is a **default** value that the function uses\n", "\n", "This is actually also true for Python's built-in functions, such as `print()` and `input()`, which we've seen earlier.\n", "\n", "\n", "#### Reading documentation\n", "\n", "When you encounter a new function, you will wonder \"how do I use it exactly?\" This is where **documentation** becomes immensely useful, and it's crucial to know **how to read it**.\n", "\n", "You can access documentation online or directly in Python (such as in code cells here in the book and in your VS Code).\n", "The online Python [documentation](https://docs.python.org/3/library/index.html) contains information about built-in functions and more. \n", "There are also dedicated package documentations such as the one for NumPy, which has information about NumPy's functions.\n", "\n", "Let's look at an example of a NumPy function called `numpy.zeros()`. \n", "To learn which arguments this function takes, you can either type `?numpy.zeros` to ask Python here in the book or in your VS Code, or check this function's [online documentation](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html).\n", "If you do that, you will see\n", "\n", "```\n", "numpy.zeros(shape, dtype=float, order='C', *, like=None)\n", "```\n", "\n", "followed by explanations about each argument, as well as information on whether it's mandatory or optional (in `np.zeros()`, only the first one is mandatory). Depending on the argument, you will also be able to read what is its default value. At the end, there are a few concrete examples of usage of this function.\n", "\n", "```{admonition} AI tip\n", ":class: ai\n", "Need a more detailed explanation of a function, or wish to see more examples of its usage? Refer to AI.\n", "```\n", "\n", "Let's try using `numpy.zeros()` ourselves:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# First we have to import numpy\n", "import numpy\n", "\n", "# Creating an array of 10 zeros\n", "numpy.zeros(10)\n", "\n", "# To see help for numpy.zeros, type ?numpy.zeros" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "````{admonition} Asterisk * in function descriptions\n", ":class: tip\n", "\n", "The asterisk `*` in a Python function description indicates that all the **following parameters are keyword-only**. This means they must be specified using their names when calling the function.\n", "\n", "For example, with our function `numpy.zeros(shape, dtype=float, order='C', *, like=None)`, if we want to use the last argument, we have to write:\n", "\n", "```\n", "numpy.zeros(5, int, 'C', like=None)\n", "```\n", "\n", "rather than just:\n", "\n", "```\n", "numpy.zeros(5, int, 'C', None)\n", "```\n", "\n", "Therefore, for the parameter `like` which is listed after the asterisk, we have to explicitly write `like=`.\n", "\n", "````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, when you wish to use a function from a specific library, such as `numpy.zeros()`, you need to define a library by using (in this case) `numpy.` in its name.\n", "To save you from some typing, you can also tell Python how you want to refer to functions from a specific package. For NumPy, it's very typical to use this import line:\n", "\n", "```\n", "import numpy as np\n", "```\n", "\n", "which then allows you to invoke its functions using `np.`, e.g., `np.zeros()` rather than the longer `numpy.zeros()`.\n", "Try recreating the code from above, but using `np` for NumPy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Your code here - create an array of 10 zeros with np.zeros" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing a single function\n", "\n", "When we import NumPy with `import numpy as np`, we get access to all its functions by adding `np.` in front of the function name.\n", "To see which functions are in the library, type `dir(numpy)`, which will generate a list of available functions.\n", "The list of functions can be quite long and exhaustive.\n", "\n", "If you **need only a single function from a library**, there is a second commonly used way to import only that single function, e.g.:\n", "\n", "```\n", "from numpy import zeros\n", "```\n", "\n", "When you do this, the function `zeros()` will be available directly, without any prefix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "from numpy import zeros\n", "a = zeros(10)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look around on the internet, you will also find people that do the following:\n", "\n", "```\n", "from numpy import *\n", "```\n", "\n", "(Remember: `*` is a [wildcard](../chapter3/bash-commands.ipynb) and here replaces a function with *any* name.)\n", "\n", "This will import all the functions from NumPy directly into the namespace with no `np.` or `numpy.` prefix. You might think: what a great idea, this will save me loads of typing! Instead of typing `np.array()` I could just type `array()`, and so on for tens of NumPy functions. \n", "\n", "While it's true that it will save typing, it also comes with a high risk: sometimes **different packages have functions that have the same name**, but do different things. A concrete example is the function `sqrt()`, which is available in both `math` and `numpy` libraries. Unfortunately, `math.sqrt()` will give an error when using NumPy arrays. \n", "\n", "If you import both of these libraries with the command above (using `*`), you will overwrite these functions by the second import, and if you're not careful, you will forget which one you are using. This could cause your code to \"break\". It will also \"crowd\" your namespace: you suddenly have hundreds or even thousands of functions, instead of just a library. \n", "\n", "For these reasons, it is generally **advised not to use `import *`**, and it is considered a **poor coding practice** in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy package\n", "\n", "As a nanobiologist, you will regularly work with scientific data or build computational simulations and models. \n", "A lot of the data will come in the form of arrays (think of vectors as 1D and matrices as 2D arrays), on which you will want to perform mathematical operations.\n", "In fact, the number of **problems in modern biological science** that uses array representations is huge: from DNA datasets to neuronal networks to biomolecular networks, data and models are conveniently and powerfully represented as arrays and linear transformations (i.e., multiplying by matrices). Python's NumPy library was built precisely to aid in this kind of programming tasks.\n", "\n", "For a deeper dive into the significance of NumPy in the world of scientific programming and for a *great visual summary* on some of the fundamental NumPy array concepts, see [this publication](https://doi.org/10.1038/s41586-020-2649-2). You can also visit [NumPy's website](https://numpy.org/doc/stable/user/whatisnumpy.html) if you're curious to learn more.\n", "\n", "In the next sections, we will familiarize ourselves with NumPy and some of its functions.\n", "\n" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.18" } }, "nbformat": 4, "nbformat_minor": 4 }