Exercises

10.4. Exercises#

In most of the exercises in this page, we don’t specify which Python package you need to use for plotting (Matplotlib or seaborn) or for importing the data (when working with files, NumPy or pandas). It’s up to you to make a choice of package. The choice can be based on the applicability (remember - for what type of data can you use NumPy?) or personal preferences (Matplotlib vs. seaborn).

Each exercise carries the difficulty level indication:

[*] Easy
[**] Moderate
[***] Advanced

The highest difficulty level is meant for students who wish to challenge themselves and/or have previous experience in programming.

Exercise 10.2 ([*] Plotting polynomial functions)

Download this exercise.

In this exercise you will practise how to plot functions with Python. Plotting functions is used, for example, to predict data trends or visualise mathematical results. The first exercise will be done for you as an example.

Example: A polynomial function

$f(x) = 3x^5 - 2x^2 - 3 $ in the interval [-1, 1].

Solution steps:

Define the function you want to plot.
Define the values for the x-axis.
Calculate the y-values using your function.
Use Matplotlib to visualise the results. Don’t forget to give the plot a title and annotate the axes.

import numpy as np
import matplotlib.pyplot as plt

def polynomial(x):    # We first define the function to be plotted
    return 3*x**5 - 2*x**2 - 3

x_axis = np.linspace(-1,1,100) # Define the x-axis within the limits

y_values = polynomial(x_axis) # Calculate the y-values

plt.plot(x_axis,y_values) # Plot the function
plt.title("Plot of a polynomial function") # Give the plot a title
plt.xlabel("x-values") # x-axis label
plt.ylabel("y-values") # y-axis label
plt.show()

Exercise A: A polynomial function - Do it yourself

Plot the polynomial function $a\cdot x^3 + b$ for three values of a or b, between -1 and 1. How does your chosen parameter influence the function?

# Your code here

Exercise B: Two functions in the same plot

Take the functions $f(x) = 3-2x$ and $f(x)=2+3x$, plot them in the same plot and find the intersection point by looking at the graph. Determine for yourself the appropriate x and y bounds such that the point of intersection is easy to read.

Remember to add a legend, a title and axis labels. You can add a legend by including label = "3 - 2x" in the parameters for plt.plot() (so plt.plot(*plotting parameters*,label = "3 - 2x")) and then adding a line plt.legend().

# Your code here

Exercise 10.3 ([**] Plotting different functions in nanobiology)

Download this exercise.

Exercise A1: Cell movement - MSD

A freely moving cell in 2D approximately follows the diffusion formula, which gives the mean squared distance (MSD) from the origin:

\[ d(t)^2 = 4\cdot D\cdot t \]

with $D$ the so-called diffusion constant. Let $D$ = 1 m²/s. Calculate and plot the MSD for the first 100 seconds of movement. Note that the unit of $d(t)^2$ is m².

# Your code here

Exercise A2: Cell movement - Fuhrt

A more accurate representation of a 2D moving cell is Fuhrt’s formula,

\[ d(t)^2 = 4\cdot D \cdot (t-P\cdot(1-e^{-t/P})) \]

where $D$ and $P$ are the diffusion and persistence constants. For our purposes, we assume $D$ = 1 m²/s and $P$ = 15 s.

Calculate and plot the MSD for the first 100 s of movement. Compare with the previous plot.

# Your code here

Exercise B: The ideal gas law

The ideal gas law is as follows:

\[ p\cdot V = N \cdot k_{B} \cdot T \]

with $p$ the pressure, $V$ the volume of the gas, $N$ the particle number, $k_B$ the Boltzmann constant, and $T$ the temperature.

Write a function for the volume depending on the pressure and temperature (keeping the particle number as a constant).
Let $N$ = 1 and $k_B$ = 1.4 $\cdot$ 10^-23 J/K. Assume 1 atm pressure (101325 Pa). Plot the volume between 100 K and 1,000K. Is the relationship as expected from the equation?
Let $N$ = 1 and $k_B$ = 1.4 $\cdot$ 10^-23 J/K. Assume room temperature (about 300 K). Plot the volume as a function of pressure between 1,000 Pa and 10,000 Pa. Is the relationship as expected?

# Your code here

Exercise C: Michaelis-Menten

Remember the Michaelis-Menten equation describing an enzymatic reaction:

\[ v =\frac{ V_{max}[S] }{K_M + [S]} \]

with $V_{max}=k_2 \cdot [E_{tot}]$ and $K_M = \frac{k_1+k_2}{k_{-1}}$. The variables here represent the reaction rate constants $k_1$, $k_{-1}$ and $k_2$, the total enzyme concentration $[E_{tot}]$, the substrate concentration $[S]$ and the maximum reaction velocity $V_{max}$.

Create a function for the Michaelis-Menten equation depending on substrate concentration $[S]$ and reaction constant $k_1$.
Compute and plot the reaction velocity as a function of $[S] \in [0,10]$ [mmol] for $k_1$ = 2 /s, $k_2$ = 1 mmol/s, $k_{-1}$ = 1 mmol/s and $E_{tot}$ = 2 mmol. Don’t forget to properly annotate the axes and give the correct units.
Now keep the substrate concentration constant at a value of 4 mmol and plot the Michaelis-Menten equation as a function of $k_1 \in [0,5]$. How does $k_1$ influence the reaction velocity?

# Your code here

Exercise 10.4 ([**] Random walk (making plots pretty))

Download this exercise.

Matplotlib provides some possibilities of personalising your plots. In this exercise, we will explore some of these options.

We will work with a function $f(x)= \sqrt{4Dt}$ for the root mean squared distance (in micrometers) of a random walk, and two averaged measurements of particles, which will have deviations from the ideal mathematical function. We will create the data using the np.random.normal() command, which will introduce noise into our perfect function. np.random.normal() creates a random number, where the probability of a number being chosen is determined by a normal distribution.

import matplotlib.pyplot as plt
import numpy as np

time = np.linspace(0, 50, 50) # seconds
calculated_rmsd = np.sqrt(4 * 15 * time) # micrometers
measured_1 = calculated_rmsd + np.random.normal(0, 2, len(time)) # micrometers
measured_2 = calculated_rmsd + np.random.normal(0, 2, len(time)) # micrometers

First, create a regular plot of the three trajectories. Don’t forget to label the axes and create a legend.

# Your code here

There are several ways to enhance this plot.

Plot size: You can specify plot size using the command plt.figure(figsize=(a,b)) before plotting, where (a,b) gives the size of each edge. Experiment with the size of your plot and choose an appropriate sizing for your plot.
Alter colour by adding color = "colour name" to the plt.plot() command. Check the list of named colours in matplotlib.
Alter linewidth by adding linewidth = a to the plt.plot() command, where a gives the width of the line in points.
Change the linestyle using linestyle = "style" to the plt.plot() command. Check the list of linestyles.
Introducing special characters in axis labels: In this example, the y-axis has been labelled using plt.ylabel("distance [um]") to indicate that the axis measures distance, in micrometers. However, we can also make the label show the character for micro (or almost any other character). For that, we use plt.ylabel(r"distance $[\mu m]$"). Here, we put the special characters between dollar signs ($), and then use \mu to get the greek letter mu. Similarly, we can use any other Greek letter. For those familiar with LaTeX, we can also put equations into labels using LaTeX formalism.

# Your code for pretty matplotlib plot

Exercise 10.5 ([**] Types of plots)

Download this exercise.

In this exercise we will explore Matplotlib and the types of plots that can be produced. Chosing the correct plot to represent your data is an important skill, and knowing the available plot types helps to make a good decision. The plot types we will discuss are: scatterplot, histogram, bar plot, boxplots, and adding errorbars.

Exercise A: Scatterplot

Make a scatterplot of the first ten Fibonacci numbers. Why is a scatterplot an appropriate plot to visualise these?

# Your code here

Exercise B: Histogram and bar plot

A research group takes a questionnaire. The scientists are interested in the age distribution of the correspondents. The ages are [14, 14, 15, 16, 17, 19, 19, 20, 22, 22, 25, 26, 27, 27, 27, 30, 33, 33, 34, 34, 34, 35, 36, 37, 37, 39, 42, 42, 43, 44, 45, 45, 45, 47, 49, 50, 50, 51, 52, 53, 55, 56, 56, 56, 56, 57, 58, 85, 60, 63, 64, 66, 66, 69, 70, 73, 74, 76].

Part I. Make a histogram of the age distributions, with one bin each for ages 10 - 19, 20 - 29, etc. until 70 - 79. Which age group was most represented in this research?

Part II. For this data, also create a bar plot. Note that for a bar plot, you need to give the heights of the bars yourself! If you want an extra challenge, write a function that counts the number of participants in their respective age groups for you. Else, use your histogram or count by hand.

# Your code here

Exercise C: Boxplot

Part I. In Python, we can generate a random integer between 10 and 79 using the command np.random.randint(10, 80). Using a for loop, create a dataset of 50 participants with random ages between 10 and 80. Make a boxplot of this distribution.

Part II. Now, repeat this process to create four datasets of randomly distributed ages, and plot them as boxplots in the same plot. Compare the results - are they as expected given that we used a random generating process? Increase the number of participants to 2,000. Are these results more or less like what you expected? What does this tell you about the reliability between participant number in a study, and reliability of the study results?

# Your code here

Exercise D: Lineplot with errorbars

A researcher measures the deviation of the temperature from room temperature of a reaction over time. They measure every second, starting at 0 and ending at 20. Their results are (in K):

[304.1, 306.6, 308.7, 310.7, 313.2, 316.8, 320.4, 323.8, 326.3, 329.3, 331.1, 332.1, 333.3, 332.1, 331.1, 329.3, 326.3, 323.8, 320.4, 316.8, 313.2]

The machine used for measurements has an error of 5 %. Create a plot of the measured data and add errorbars to the measurement points.

# Your code here

Exercise 10.6 ([**] Puromycin)

Download this exercise.

Puromycin is an antibiotic that acts as a protein synthesis inhibitor. With this exercise, you received data of the reaction velocity depending on the substrate concentration in an enzymatic reaction, involving cells that were either untreated (control) or treated with puromycin. We will investigate the effect of puromycin in this experiment.

Exercise A: The data

You received data file “Puromycin.csv” with this exercise. Use the terminal to explore its contents. Depending on what you discover in the file, choose an appropriate Python package and import the data in Python.

How many measurements points are there in total? Is it same for both puromycin-treated and untreated cells?

Use Python to come up with these counts and make print statements to inform about your findings.

# Your code here

Exercise B: Plotting

Now you have a better idea what’s in your dataset, however, it’s still not straightforward to compare the two conditions (treated and untreated) from looking at a table. So let’s plot the data to see how reaction velocity (rate) depending on substrate concentration is affected by puromycin.

You have a few choices to make:

Which Python plotting package to use?
What is the best type of graph for this comparison?

You can also explore different options of packages and graphs. Independent of your coding choices, your graph should be well labelled. It should also be easy to read, i.e., it should be obvious from your graph what the effect of puromycin is.

# Your code here

Exercise 10.7 ([**] Diffusing particles (multiple plots in one))

Download this exercise.

Exercise A: Mean squared distance

The mean squared distance of a diffusing particle (random motion) in 2D can be calculated by

\[ d(t)^2 = 4 \cdot D \cdot t \]

Plot the root mean squared distance ( $\sqrt{d(t)^2}$ ) for $D$ = 10 m²/s, $D$ = 20 m²/s and $D$ = 30 m²/s for the first 100 s. Use a legend to annotate which particle is which.

What is the advantage of using one plot over multiple plots to visualise this?

# Your code here

Exercise B: Subplots

Take the function:

$ f(x) = 3x^4 + \frac{4}{x} $.

Plot the function and the first, the second, and the third derivative in one plot, with -1 < x < 1. What do you see: are there any issues with this plot?
Instead, now plot the function and it’s first to third derivative in a total of four subplots. What are the advantages of using subplots in this case?

Tip: you can use fig.tight_layout() to prevent overlap between subplot titles.

# Your code here

Exercise 10.8 ([**] Beavers II)

Download this exercise.

../../_images/American_Beaver.jpg — Fig. 10.12 Photo source.#

This exercise builds on the beavers exercise from the previous chapter (data source).

As a reminder: We are working with a small part of a study of the long-term temperature dynamics of beaver Castor canadensis in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from one period of less than a day of two animals is used here.

In the previous beavers exercise, you combined the data of two beavers into one table, which you saved on your computer. This file you previously prepared is the starting point of this exercise.

Exercise A: Import data into Python

Load your table that combines the data of two beavers into Python and perform some preliminary data exploration using Python:

Which columns are in the data? What types of values do they contain?
How many different animals’ data do you have?
How many different values are possible for activity outside retreat, and how many of each do you have in the data?
How many different days and timepoints are in the data?

# Your code here

Exercise B: Exploratory plotting

No matter the dataset you’re handling, after getting acquainted with your dataset (like you just did in Exercise A) the next thing you will always want to do is exploratory data analysis via plotting.

In this exercise, we are not giving you specific directions for plots you need to make. Instead, knowing what data you have in your beavers table, think about what plots could be interesting and make them using Python.

Based on your plots, what can you learn about beavers? Try to learn as much as possible about these cute rodents from your data.

# Your code here

Exercise 10.9 ([**] Air quality)

Download this exercise.

With this exercise, you received a file “airquality.csv”, containing daily air quality measurements in New York, May to September 1973 (source).

Check the data in the terminal, then import the file into Python and perform exploratory analysis of data contents.

Based on what you learn, decide which plots would be useful to learn more about the data and make them in Python. Go to the source of the data to learn more about the units of data in each column. Before plotting, transform your data such that it’s in units commonly used in Europe.

What can you learn about this data with plotting?

# Your code here

Exercise 10.10 ([**] Hair and eye color statistics)

Download this exercise.

With this exercise, you received a file “HairEyeColor.csv”, containing distribution of hair and eye color in statistics students. The table comes from a survey of students at the University of Delaware (source).

Check the data in the terminal, then import the file into Python and perform exploratory analysis of data contents. Based on what you learn, decide which plots would be useful to learn more about the data and make them in Python.

What can you learn about this data with plotting?

# Your code here

Exercise 10.11 ([**] Cell location)

Download this exercise.

In this exercise on data analysis you will practice importing data to Python, using simple mathematical commands on matrices and data frames.

Import “cell_location.txt” data into Python and print the first ten rows.

This dataset follows a single simulated cell as it diffuses. The two rows represent the x and y coordinate of the cell, respectively.

# Your code here

As you can see, the x and y coordinates are offset. Subtract this offset from the dataset and save the new data to a file called “cell_location_new.txt”.

# Your code here

Now, plot the trajectory of the cell from your new cell position data. Don’t forget to annotate the axes and generally make a visually pleasing plot.

# Your code here

What is the total distance travelled by the cell?

Exercise 10.12 ([**] Radioactive decay)

Download this exercise.

In this exercise you will simulate the radioactive decay of polonium, with a half-life of 388 years. Start with 1,000 radioactive atoms and follow the decay for 4,000 years, indexing the current number of atoms every ten years.

Exercise A: Exponential

Plot the number of radioactive polonium molecules versus time, using that $N(t) = N_0(\frac{1}{2})^{t/t_{\frac{1}{2}}}$, where:

$N(t)$: The number of radioactive atoms remaining at time t
$N_0$: The initial number of radioactive atoms
$t$: Time elapsed
$t_{1/2}$: The half-life of the substance

# Your code here

Exercise B: Stochastic

Radioactive decay is a stochastical process. Instead of following a nice exponential, each atom has a 1/388 chance to decay per year.

Write a for loop, in which np.random.rand() is used to simulate how many molecules decay within those ten years. Then, plot the amount of molecules every ten years as well as the theoretical result in one plot.

Don’t forget to add a legend. Add a second, zoomed in plot of a section (use subplots!).

Note: np.random.rand() gives a random number between 0 and 1 from a uniform distribution. We can use it to simulate stochastic events by writing code that effectively says “do x only if np.random.rand() < 1/388”.

# Your code here

Exercise 10.13 ([**] DIY plotting)

If you want to challenge yourself further with importing, manipulating, and plotting data, you can download a lot of interesting public datasets from Kaggle’s website.

Exercise 10.14 ([***] Projectile motion)

Download this exercise.

Although it is nice (and valuable) to calculate the range of projectile motion by hand, it is much more convenient to write a program that allows you to study this motion and do various calculations within a few seconds. The basic equations are:

\[x = v_x \cdot t = v \cdot \cos(\theta) \cdot t \]

\[y = v_y \cdot t - 1/2 \cdot g \cdot t^2 = v \cdot \sin(\theta) \cdot t - 1/2 \cdot g \cdot t^2 \]

where $x$ and $y$ are the respective position components, $g$ is the gravitational constant, $t$ indicates time, $v$ is the velocity of the throw and $\theta$ is the angle the ball was thrown at.

Exercise A:

Plot the projectile motion for $\theta$ = $\pi/3$ and $v$ = 100 m/s and plot the ($x,y$)-position for 15 seconds.

# Your code here

Exercise B:

The above plot shows the projectile motion, but does not stop at $x$ = 0. You can solve these equations for $t_{max}$ and $x_{max}$:

\[t_{max} = \frac{2 v_y}{g} = \frac{2 v \sin(\theta)}{g} \]

\[x_{max} = v_x t_{max}=\frac{2 v_x v_y}{g} = \frac{2 v^2 \cos(\theta) \sin(\theta)}{g}\]

Write a function which returns $x_{max}$ as a function of inputs $\theta$ and $v$.
Make an array with 20 evenly spaced angles $\theta$ in the $[0,\pi/2]$ domain, using numpy.linspace.
Plot $x_{max}$ as a function of $\theta$ for $v=10$ m/s.
Additionally, plot $x_{max}$ as a function of $\theta$ for $v=20$ m/s.

From the two plots, which angle is the best to reach the maximal $x_{max}$?

# Your code here