7.1. Libraries, documentation, and NumPy#

NumPy (Numerical Python) is the fundamental package for scientific computing in Python.

But what are Python packages, and why would we want to use NumPy in particular?

7.1.1. Python packages#

Packages are building blocks of programming. Packages consist of reusable pieces of code so that you do not have to program everything from scratch (and, thereby, you don’t have to keep “reinventing the wheel”). Sometimes we refer to published packages as libraries. According to Python packages, there is over 350,000 Python packages (based on data from 2022), and the number keeps growing. This means that there is a lot of code out there for you to reuse!

7.1.1.1. Importing a package#

When you want to use a package in your code, you have to tell that to Python explicitly. We call this importing a package into the namespace of your kernel. Namespace is a container that holds a collection of functions and variable names.

You typically do this at the very beginning of your Python script by using a line such as this:

import numpy

If you have multiple packages to import, you also need several import lines one beneath the other.

In order to use a package, you have to make sure it’s actually installed. If you use Anaconda, it installs a lot of the packages and tools for you. In our case, we installed a more light-weight version called Miniconda (remember the installation steps), so we will need to install additional packages ourselves.

In case you want to install a package on your own, two popular tools to do that are pip and conda. In practice, installing a package (such as NumPy) is as simple as opening your terminal and running one of these lines:

conda install numpy

or

pip install numpy

If you’re unsure whether a Python package is installed, you can also simply check it in your terminal using pip list or conda list. Check with conda - do you already have NumPy installed?

Hands-on: installing NumPy on your laptop

If you type conda list in your terminal, you will notice that NumPy wasn’t part of our Miniconda installation. Therefore, if you try to run the line import numpy in your VS Code, it will result in an error.

Let’s install NumPy from within the terminal by running: conda install numpy. When prompted, type in y for “yes” and press Enter. If you then rerun conda list after the installation is finished, you will find NumPy in the list.

In VS Code, select View > Command Palette. Then type in and select Python: Select Interpretercommand from the Command Palette. Select Python with “base” from the offered options. Now you can import and use NumPy in your scripts in VS Code!

Note: we will use only a few basic packages throughout this book, which we’ll install in our “base” environment. To learn more about creating Python environments, see the dropdown box on Environments below.

7.1.1.2. Packages contain functions#

But what does a package actually look like, what’s inside it? A package contains functions, with each function performing a specific task. A function typically also has a set of arguments, which allow you to pass information into the function to tailor what exactly it will do. Depending on a function, the arguments can be:

  • mandatory - you have to specify this for a function to work

  • optional - you can set this parameter yourself; if you don’t, there is a default value that the function uses

This is actually also true for Python’s built-in functions, such as print() and input(), which we’ve seen earlier.

7.1.1.2.1. Reading documentation#

When you encounter a new function, you will wonder “how do I use it exactly?” This is where documentation becomes immensely useful, and it’s crucial to know how to read it.

You can access documentation online or directly in Python (such as in code cells here in the book and in your VS Code). The online Python documentation contains information about built-in functions and more. There are also dedicated package documentations such as the one for NumPy, which has information about NumPy’s functions.

Let’s look at an example of a NumPy function called numpy.zeros(). To learn which arguments this function takes, you can either type ?numpy.zeros to ask Python here in the book or in your VS Code, or check this function’s online documentation. If you do that, you will see

numpy.zeros(shape, dtype=float, order='C', *, like=None)

followed by explanations about each argument, as well as information on whether it’s mandatory or optional (in np.zeros(), only the first one is mandatory). Depending on the argument, you will also be able to read what is its default value. At the end, there are a few concrete examples of usage of this function.

Let’s try using numpy.zeros() ourselves:

# First we have to import numpy
import numpy

# Creating an array of 10 zeros
numpy.zeros(10)

# To see help for numpy.zeros, type ?numpy.zeros

Asterisk * in function descriptions

The asterisk * in a Python function description indicates that all the following parameters are keyword-only. This means they must be specified using their names when calling the function.

For example, with our function numpy.zeros(shape, dtype=float, order='C', *, like=None), if we want to use the last argument, we have to write:

numpy.zeros(5, int, 'C', like=None)

rather than just:

numpy.zeros(5, int, 'C', None)

Therefore, for the parameter like which is listed after the asterisk, we have to explicitly write like=.

In general, when you wish to use a function from a specific library, such as numpy.zeros(), you need to define a library by using (in this case) numpy. in its name. To save you from some typing, you can also tell Python how you want to refer to functions from a specific package. For NumPy, it’s very typical to use this import line:

import numpy as np

which then allows you to invoke its functions using np., e.g., np.zeros() rather than the longer numpy.zeros(). Try recreating the code from above, but using np for NumPy.

# Your code here - create an array of 10 zeros with np.zeros

7.1.1.3. Importing a single function#

When we import NumPy with import numpy as np, we get access to all its functions by adding np. in front of the function name. To see which functions are in the library, type dir(numpy), which will generate a list of available functions. The list of functions can be quite long and exhaustive.

If you need only a single function from a library, there is a second commonly used way to import only that single function, e.g.:

from numpy import zeros

When you do this, the function zeros() will be available directly, without any prefix:

from numpy import zeros
a = zeros(10)
print(a)

If you look around on the internet, you will also find people that do the following:

from numpy import *

(Remember: * is a wildcard and here replaces a function with any name.)

This will import all the functions from NumPy directly into the namespace with no np. or numpy. prefix. You might think: what a great idea, this will save me loads of typing! Instead of typing np.array() I could just type array(), and so on for tens of NumPy functions.

While it’s true that it will save typing, it also comes with a high risk: sometimes different packages have functions that have the same name, but do different things. A concrete example is the function sqrt(), which is available in both math and numpy libraries. Unfortunately, math.sqrt() will give an error when using NumPy arrays.

If you import both of these libraries with the command above (using *), you will overwrite these functions by the second import, and if you’re not careful, you will forget which one you are using. This could cause your code to “break”. It will also “crowd” your namespace: you suddenly have hundreds or even thousands of functions, instead of just a library.

For these reasons, it is generally advised not to use import *, and it is considered a poor coding practice in Python.

7.1.2. NumPy package#

As a nanobiologist, you will regularly work with scientific data or build computational simulations and models. A lot of the data will come in the form of arrays (think of vectors as 1D and matrices as 2D arrays), on which you will want to perform mathematical operations. In fact, the number of problems in modern biological science that uses array representations is huge: from DNA datasets to neuronal networks to biomolecular networks, data and models are conveniently and powerfully represented as arrays and linear transformations (i.e., multiplying by matrices). Python’s NumPy library was built precisely to aid in this kind of programming tasks.

For a deeper dive into the significance of NumPy in the world of scientific programming and for a great visual summary on some of the fundamental NumPy array concepts, see this publication. You can also visit NumPy’s website if you’re curious to learn more.

In the next sections, we will familiarize ourselves with NumPy and some of its functions.