10.4. Exercises#
In most of the exercises in this page, we don’t specify which Python package you need to use for plotting (Matplotlib or seaborn) or for importing the data (when working with files, NumPy or pandas). It’s up to you to make a choice of package. The choice can be based on the applicability (remember - for what type of data can you use NumPy?) or personal preferences (Matplotlib vs. seaborn).
Each exercise carries the difficulty level indication:
[*] Easy
[**] Moderate
[***] Advanced
The highest difficulty level is meant for students who wish to challenge themselves and/or have previous experience in programming.
([*] Plotting polynomial functions)
In this exercise you will practise how to plot functions with Python. Plotting functions is used, for example, to predict data trends or visualise mathematical results. The first exercise will be done for you as an example.
Example: A polynomial function
\(f(x) = 3x^5 - 2x^2 - 3 \) in the interval [-1, 1].
Solution steps:
Define the function you want to plot.
Define the values for the x-axis.
Calculate the y-values using your function.
Use Matplotlib to visualise the results. Don’t forget to give the plot a title and annotate the axes.
import numpy as np
import matplotlib.pyplot as plt
def polynomial(x): # We first define the function to be plotted
return 3*x**5 - 2*x**2 - 3
x_axis = np.linspace(-1,1,100) # Define the x-axis within the limits
y_values = polynomial(x_axis) # Calculate the y-values
plt.plot(x_axis,y_values) # Plot the function
plt.title("Plot of a polynomial function") # Give the plot a title
plt.xlabel("x-values") # x-axis label
plt.ylabel("y-values") # y-axis label
Exercise A: A polynomial function - Do it yourself
Plot the polynomial function \(a\cdot x^3 + b\) for three values of a
or b
,
between -1 and 1. How does your chosen parameter influence the function?
# Your code here
Exercise B: Two functions in the same plot
Take the functions \(f(x) = 3-2x\) and \(f(x)=2+3x\), plot them in the same plot and find the intersection point by looking at the graph. Determine for yourself the appropriate x and y bounds such that the point of intersection is easy to read.
Remember to add a legend, a title and axis labels. You can add a legend by
including label = "3 - 2x"
in the parameters for plt.plot()
(so plt.plot(*plotting parameters*,label = "3 - 2x")
) and then adding a line
plt.legend()
.
# Your code here
([**] Plotting different functions in nanobiology)
Exercise A1: Cell movement - MSD
A freely moving cell in 2D approximately follows the diffusion formula, which gives the mean squared distance (MSD) from the origin:
with \(D\) the so-called diffusion constant. Let \(D\) = 1 m2/s. Calculate and plot the MSD for the first 100 seconds of movement. Note that the unit of \(d(t)^2\) is m2.
# Your code here
Exercise A2: Cell movement - Fuhrt
A more accurate representation of a 2D moving cell is Fuhrt’s formula,
where \(D\) and \(P\) are the diffusion and persistence constants. For our purposes, we assume \(D\) = 1 m2/s and \(P\) = 15 s.
Calculate and plot the MSD for the first 100 s of movement. Compare with the previous plot.
# Your code here
Exercise B: The ideal gas law
The ideal gas law is as follows:
with \(p\) the pressure, \(V\) the volume of the gas, \(N\) the particle number, \(k_B\) the Boltzmann constant, and \(T\) the temperature.
Write a function for the volume depending on the pressure and temperature (keeping the particle number as a constant).
Let \(N\) = 1 and \(k_B\) = 1.4 \(\cdot\) 10-23 J/K. Assume 1 atm pressure (101325 Pa). Plot the volume between 100 K and 1,000K. Is the relationship as expected from the equation?
Let \(N\) = 1 and \(k_B\) = 1.4 \(\cdot\) 10-23 J/K. Assume room temperature (about 300 K). Plot the volume as a function of pressure between 1,000 Pa and 10,000 Pa. Is the relationship as expected?
# Your code here
Exercise C: Michaelis-Menten
Remember the Michaelis-Menten equation describing an enzymatic reaction:
with \(V_{max}=k_2 \cdot [E_{tot}]\) and \(K_M = \frac{k_1+k_2}{k_{-1}}\). The variables here represent the reaction rate constants \(k_1\), \(k_{-1}\) and \(k_2\), the total enzyme concentration \([E_{tot}]\), the substrate concentration \([S]\) and the maximum reaction velocity \(V_{max}\).
Create a function for the Michaelis-Menten equation depending on substrate concentration \([S]\) and reaction constant \(k_1\).
Compute and plot the reaction velocity as a function of \([S] \in \{0,10\}\) [mmol] for \(k_1\) = 2 /s, \(k_2\) = 1 mmol/s, \(k_{-1}\) = 1 mmol/s and \(E_{tot}\) = 2 mmol. Don’t forget to properly annotate the axes and give the correct units.
Now keep the substrate concentration constant at a value of \(4mmol\) and plot the Michaelis-Menten equation as a function of \(k_1 \in \{0,5\}\). How does \(k_1\) influence the reaction velocity?
# Your code here
([**] Random walk (making plots pretty))
Matplotlib provides some possibilities of personalising your plots. In this exercise, we will explore some of these options.
We will work with a function \(f(x)= \sqrt{4Dt}\) for the root mean squared
distance (in micrometers) of a random walk, and two averaged measurements of
particles, which will have deviations from the ideal mathematical function.
We will create the data using the np.random.normal()
command, which
will introduce noise into our perfect function. np.random.normal()
creates a random number, where the probability of a number being chosen is
determined by a normal distribution.
import matplotlib.pyplot as plt
import numpy as np
time = np.linspace(0, 50, 50)
calculated_rmsd = np.sqrt(4 * 15 * time)
measured_1 = calculated_rmsd + np.random.normal(0, 2, len(time))
measured_2 = calculated_rmsd + np.random.normal(0, 2, len(time))
First, create a regular plot of the three trajectories. Don’t forget to label the axes and create a legend.
# Your code here
There are several ways to enhance this plot.
Plot size: You can specify plot size using the command
plt.figure(figsize=(a,b))
before plotting, where (a,b) gives the size of each edge. Experiment with the size of your plot and choose an appropriate sizing for your plot.Alter colour by adding
color = "colour name"
to theplt.plot()
command. Check the list of named colours in matplotlib.Alter linewidth by adding
linewidth = a
to theplt.plot()
command, wherea
gives the width of the line in points.Change the linestyle using
linestyle = "style"
to theplt.plot()
command. Check the list of linestyles.Introducing special characters in axis labels: In this example, the y-axis has been labelled using
plt.ylabel("distance [um]")
to indicate that the axis measures distance, in micrometers. However, we can also make the label show the character for micro (or almost any other character). For that, we useplt.ylabel(r"distance $[\mu m]$")
. Here, we put the special characters between dollar signs ($
), and then use\mu
to get the greek letter mu. Similarly, we can use any other Greek letter. For those familiar with LaTeX, we can also put equations into labels using LaTeX formalism.
# Your code for pretty matplotlib plot
([**] Types of plots)
In this exercise we will explore Matplotlib and the types of plots that can be produced. Chosing the correct plot to represent your data is an important skill, and knowing the available plot types helps to make a good decision. The plot types we will discuss are: scatterplot, histogram, bar plot, boxplots, and adding errorbars.
Exercise A: Scatterplot
Make a scatterplot of the first ten Fibonacci numbers. Why is a scatterplot an appropriate plot to visualise these?
# Your code here
Exercise B: Histogram and bar plot
A research group takes a questionnaire. The scientists are interested in the age distribution of the correspondents. The ages are [14, 14, 15, 16, 17, 19, 19, 20, 22, 22, 25, 26, 27, 27, 27, 30, 33, 33, 34, 34, 34, 35, 36, 37, 37, 39, 42, 42, 43, 44, 45, 45, 45, 47, 49, 50, 50, 51, 52, 53, 55, 56, 56, 56, 56, 57, 58, 85, 60, 63, 64, 66, 66, 69, 70, 73, 74, 76].
Part I. Make a histogram of the age distributions, with one bin each for ages 10 - 19, 20 - 29, etc. until 70 - 79. Which age group was most represented in this research?
Part II. For this data, also create a bar plot. Note that for a bar plot, you need to give the heights of the bars yourself! If you want an extra challenge, write a function that counts the number of participants in their respective age groups for you. Else, use your histogram or count by hand.
# Your code here
Exercise C: Boxplot
Part I. In Python, we can generate a random integer between 10 and 79 using the
command np.random.randint(10, 80)
.
Using a for
loop, create a dataset of 50 participants with random ages between
10 and 80. Make a boxplot of this distribution.
Part II. Now, repeat this process to create four datasets of randomly distributed ages, and plot them as boxplots in the same plot. Compare the results - are they as expected given that we used a random generating process? Increase the number of participants to 2,000. Are these results more or less like what you expected? What does this tell you about the reliability between participant number in a study, and reliability of the study results?
# Your code here
Exercise D: Lineplot with errorbars
A researcher measures the deviation of the temperature from room temperature of a reaction over time. They measure every second, starting at 0 and ending at 20. Their results are (in K):
[304.1, 306.6, 308.7, 310.7, 313.2, 316.8, 320.4, 323.8, 326.3, 329.3, 331.1, 332.1, 333.3, 332.1, 331.1, 329.3, 326.3, 323.8, 320.4, 316.8, 313.2]
The machine used for measurements has an error of 5 %. Create a plot of the measured data and add errorbars to the measurement points.
# Your code here
([**] Puromycin)
Puromycin is an antibiotic that acts as a protein synthesis inhibitor. With this exercise, you received data of the reaction velocity depending on the substrate concentration in an enzymatic reaction, involving cells that were either untreated (control) or treated with puromycin. We will investigate the effect of puromycin in this experiment.
Exercise A: The data
You received data file “Puromycin.csv” with this exercise. Use the terminal to explore its contents. Depending on what you discover in the file, choose an appropriate Python package and import the data in Python.
How many measurements points are there in total? Is it same for both puromycin-treated and untreated cells?
Use Python to come up with these counts and make print statements to inform about your findings.
# Your code here
Exercise B: Plotting
Now you have a better idea what’s in your dataset, however, it’s still not straightforward to compare the two conditions (treated and untreated) from looking at a table. So let’s plot the data to see how reaction velocity (rate) depending on substrate concentration is affected by puromycin.
You have a few choices to make:
Which Python plotting package to use?
What is the best type of graph for this comparison?
You can also explore different options of packages and graphs. Independent of your coding choices, your graph should be well labelled. It should also be easy to read, i.e., it should be obvious from your graph what the effect of puromycin is.
# Your code here
([**] Diffusing particles (multiple plots in one))
Exercise A: Mean squared distance
The mean squared distance of a diffusing particle (random motion) in 2D can be calculated by
Plot the root mean squared distance ( \(\sqrt{d(t)^2}\) ) for \(D\) = 10 m2/s, \(D\) = 20 m2/s and \(D\) = 30 m2/s for the first 100 s. Use a legend to annotate which particle is which.
What is the advantage of using one plot over multiple plots to visualise this?
# Your code here
Exercise B: Subplots
Take the function:
\( f(x) = 3x^4 + \frac{4}{x} \).
Plot the function and the first, the second, and the third derivative in one plot, with -1 < x < 1. What do you see: are there any issues with this plot?
Instead, now plot the function and it’s first to third derivative in a total of four subplots. What are the advantages of using subplots in this case?
Tip: you can use fig.tight_layout()
to prevent overlap between subplot titles.
# Your code here
([**] Beavers II)
This exercise builds on the beavers exercise from the previous chapter (data source).
As a reminder: We are working with a small part of a study of the long-term temperature dynamics of beaver Castor canadensis in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from one period of less than a day of two animals is used here.
In the previous beavers exercise, you combined the data of two beavers into one table, which you saved on your computer. This file you previously prepared is the starting point of this exercise.
Exercise A: Import data into Python
Load your table that combines the data of two beavers into Python and perform some preliminary data exploration using Python:
Which columns are in the data? What types of values do they contain?
How many different animals’ data do you have?
How many different values are possible for activity outside retreat, and how many of each do you have in the data?
How many different days and timepoints are in the data?
# Your code here
Exercise B: Exploratory plotting
No matter the dataset you’re handling, after getting acquainted with your dataset (like you just did in Exercise A) the next thing you will always want to do is exploratory data analysis via plotting.
In this exercise, we are not giving you specific directions for plots you need to make. Instead, knowing what data you have in your beavers table, think about what plots could be interesting and make them using Python.
Based on your plots, what can you learn about beavers? Try to learn as much as possible about these cute rodents from your data.
# Your code here
([**] Air quality)
With this exercise, you received a file “airquality.csv”, containing daily air quality measurements in New York, May to September 1973 (source).
Check the data in the terminal, then import the file into Python and perform exploratory analysis of data contents.
Based on what you learn, decide which plots would be useful to learn more about the data and make them in Python. Go to the source of the data to learn more about the units of data in each column. Before plotting, transform your data such that it’s in units commonly used in Europe.
What can you learn about this data with plotting?
# Your code here
([**] Hair and eye color statistics)
With this exercise, you received a file “HairEyeColor.csv”, containing distribution of hair and eye color in statistics students. The table comes from a survey of students at the University of Delaware (source).
Check the data in the terminal, then import the file into Python and perform exploratory analysis of data contents. Based on what you learn, decide which plots would be useful to learn more about the data and make them in Python.
What can you learn about this data with plotting?
# Your code here
([**] Cell location)
In this exercise on data analysis you will practice importing data to Python, using simple mathematical commands on matrices and data frames.
Import “cell_location.txt” data into Python and print the first ten rows.
This dataset follows a single simulated cell as it diffuses. The two rows represent the x and y coordinate of the cell, respectively.
# Your code here
As you can see, the x and y coordinates are offset. Subtract this offset from the dataset and save the new data to a file called “cell_location_new.txt”.
# Your code here
Now, plot the trajectory of the cell from your new cell position data. Don’t forget to annotate the axes and generally make a visually pleasing plot.
# Your code here
What is the total distance travelled by the cell?
([**] Radioactive decay)
In this exercise you will simulate the radioactive decay of polonium, with a half-life of 388 years. Start with 1,000 radioactive atoms and follow the decay for 4,000 years, indexing the current number of atoms every ten years.
Exercise A: Exponential
Plot the number of radioactive polonium molecules versus time, using that \(N(t) = N_0(\frac{1}{2})^{t/t_{\frac{1}{2}}}\), where:
\(N(t)\): The number of radioactive atoms remaining at time t
\(N_0\): The initial number of radioactive atoms
\(t\): Time elapsed
\(t_{1/2}\): The half-life of the substance
# Your code here
Exercise B: Stochastic
Radioactive decay is a stochastical process. Instead of following a nice exponential, each atom has a 1/388 chance to decay per year.
Write a for
loop, in which np.random.rand()
is used to simulate how many
molecules decay within those ten years. Then, plot the amount of molecules every
ten years as well as the theoretical result in one plot.
Don’t forget to add a legend. Add a second, zoomed in plot of a section (use subplots!).
Note: np.random.rand()
gives a random number between 0 and 1 from a uniform
distribution. We can use it to simulate stochastic events by writing code that
effectively says “do x only if np.random.rand() < 1/388”.
# Your code here
([**] DIY plotting)
If you want to challenge yourself further with importing, manipulating, and plotting data, you can download a lot of interesting public datasets from Kaggle’s website.
([***] Projectile motion)
Although it is nice (and valuable) to calculate the range of projectile motion by hand, it is much more convenient to write a program that allows you to study this motion and do various calculations within a few seconds. The basic equations are:
where \(x\) and \(y\) are the respective position components, \(g\) is the gravitational constant, \(t\) indicates time, \(v\) is the velocity of the throw and \(\theta\) is the angle the ball was thrown at.
Exercise A:
Plot the projectile motion for \(\theta\) = \(\pi/3\) and \(v\) = 100 m/s and plot the (\(x,y\))-position for 15 seconds.
# Your code here
Exercise B:
The above plot shows the projectile motion, but does not stop at \(x\) = 0. You can solve these equations for \(t_{max}\) and \(x_{max}\):
Write a function which returns \(x_{max}\) as a function of inputs \(\theta\) and \(v\).
Make an array with 20 evenly spaced angles \(\theta\) in the \([0,\pi/2]\) domain, using
numpy.linspace
.Plot \(x_{max}\) as a function of \(\theta\) for \(v=10\) m/s.
Additionally, plot \(x_{max}\) as a function of \(\theta\) for \(v=20\) m/s.
From the two plots, which angle is the best to reach the maximal \(x_{max}\)?
# Your code here