{
"cells": [
{
"cell_type": "markdown",
"id": "9965f1b0",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "cell-3b587ecf818e3119",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"# Exercises\n",
"\n",
"Each exercise carries the difficulty level indication:\n",
"- [*] Easy\n",
"- [**] Moderate\n",
"- [***] Advanced\n",
"\n",
"The highest difficulty level is meant for students who wish to challenge themselves and/or have previous experience in programming."
]
},
{
"cell_type": "markdown",
"id": "916978ec",
"metadata": {},
"source": [
"`````{exercise} [*] Mercury vapor pressure\n",
":class: dropdown\n",
"{download}`Download this exercise.<../Coding_exercises/Week_4/Nina/01_Hg_pressure.zip>`\n",
"\n",
"Dataset reference: [R pressure dataset](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/pressure.html).\n",
"\n",
"In this exercise, we will work with a file containing data on vapor pressure\n",
"of mercury (in millimeters of mercury) as a function of temperature (in degrees \n",
"Celsius). You received this file, called `pressure.csv`, with this exercise.\n",
"\n",
"**To start**: Explore\n",
"\n",
"Navigate to and open the data file in the terminal.\n",
"What does it look like? Is it in line with what you'd expect from a CSV file \n",
"(what kind of delimiter do you see)? Are there header rows?\n",
"\n",
"**Exercise A**: Open\n",
"\n",
"Open the file in Python using NumPy's function `np.loadtext()`. \n",
"\n",
"Hints:\n",
"* What is the location of your file with respect to this Python script?\n",
"* Do you need to set anything else in your function?\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise B**: Shape\n",
"\n",
"What is the size of your data (number of rows and columns)?\n",
"Check it with Python and print the output.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise C**: Modify\n",
"\n",
"Degrees Celsius and millimiters of mercury are not SI units for temperature \n",
"and pressure. Using Python, modify your data so that the values are \n",
"expressed in SI units.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise D**: Save\n",
"\n",
"Now that you have changed your data into SI units that we want to use for further \n",
"analysis, save it!\n",
"You can name the new file `pressure_SI_units.csv`.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"`````"
]
},
{
"cell_type": "markdown",
"id": "cf16c237",
"metadata": {},
"source": [
"`````{exercise} [**] Cars\n",
":class: dropdown\n",
"{download}`Download this exercise.<../Coding_exercises/Week_4/Nina/02_Cars.zip>`\n",
"\n",
"In this exercise, we will work with a file containing data on speeds of cars\n",
"and the distances required to stop. You received this file, called `cars.csv`, \n",
"with this exercise.\n",
"\n",
"**Exercise A**: File contents\n",
"\n",
"Navigate to and open the data file in the terminal. What does it look like? \n",
"\n",
"Open the file in Python using NumPy. How many rows are there? What do the numbers look like? Are the \n",
"values in line with your expectation? If not, can you think of reasons why? \n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"\n",
"**Exercise B**: Making sense of data\n",
"\n",
"Navigate to the [data source](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/cars.html) and learn more about your dataset.\n",
"\n",
"With this information, do the values in the file make more sense?\n",
"\n",
"Transform the data into units that we commonly use and save it \n",
"using NumPy.\n",
"\n",
"Are the distances what you'd expect from modern cars?\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise C**: pandas\n",
"\n",
"Could you also import, manipulate, and save this data into Python using pandas? \n",
"Try it!\n",
"\n",
"Is it easier or more difficult compared to NumPy?\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"`````"
]
},
{
"cell_type": "markdown",
"id": "2c514635",
"metadata": {},
"source": [
"`````{exercise} [*] Proteins \n",
":class: dropdown\n",
"{download}`Download this exercise.<../Coding_exercises/Week_4/Verena/01_Importing_data.zip>`\n",
"\n",
"In this exercise we will start with importing data, either the whole or partial\n",
"file, from different file formats. For this, we will use the pandas package.\n",
"\n",
"To start: Navigate to and open the data file in the terminal.\n",
"What does it look like? What kind of delimiter do you see?\n",
"Are there header rows?\n",
"\n",
"**Exercise A**: CSV file\n",
"\n",
"To import a CSV file, we will use `pd.read_csv(r\"\\path\\to\\file\\filename.csv)`. \n",
"\n",
"You received the file \"uniprotkb_organism_id_9606_AND_reviewed_2024_06_19.csv\" \n",
"with this exercise. Read the file using pandas and find out in which format the \n",
"data is stored using the `type` command. Then, print the data and analyse the \n",
"output of the print command. \n",
"\n",
"What information on the data can you find? \n",
"E.g., how many rows and columns does your data have?\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"\n",
"**Exercise B**: Excel file \n",
"\n",
"Instead of a CSV file, you can also read Excel files using \n",
"`pd.read_csv(r\"\\path\\to\\file\\filename.xlsx)`. Try it! \n",
"We've provided you with an Excel file carrying the same name as the above CSV.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise C**: Importing with parameters\n",
"\n",
"pandas stores data in a so-called data frame, which is a 2D table with indexed \n",
"rows and columns. When printing the data, you will find the row index to the left from \n",
"the respective row (starting at 0,1,2,....). The top-most row gives the column \n",
"names. For the imported data, from only looking at the dataframe information \n",
"(not the original file), give the names of all the columns. \n",
"\n",
"```\n",
"# Solution: Entry, Reviewed, Entry name, Protein names, Gene Names, Organism, Length\n",
"```\n",
"\n",
"There are a few important parameters to `pd.read_csv()` that you can play \n",
"around with. A (non-exhaustive list) is:\n",
"- `sep`: The default separator between columns in CSV files is a column, but if \n",
"you have a different delimeter, specify it using `sep =`.\n",
"- `header`: Gives the row that has the column names. The default is to use the \n",
"0-th row as column names. If your data has no headers, use `header = None`. Use \n",
"`header = 0` paired with `names = (you, column, names)` to manually override \n",
"the header names.\n",
"- `usecols`: select the colums to be imported. You can both use the column \n",
"numbers (e.g., `usecols = [1,2]`) or headers (e.g., `usecols=[\"Reviewed\",\"Entry Name\"]`).\n",
"- `nrows`: Select the number of rows you want to use.\n",
"\n",
"For this exercise, import the first two columns and first 30 rows of the data.\n",
"Use CSV file as in exercise A.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise D**: Text files\n",
"\n",
"Of course, Excel isn't the only data type you might need to read. Another \n",
"common file type is a text file. For this, we will use the \n",
"\"Phosphorylation_Y.txt\" file you received with this exercise. It describes \n",
"tyrosine phosphorylation sites, including the UniProt IDs, tyrosine site, \n",
"phosphorylated motif, and more. \n",
"\n",
"For importing, we will use `pd.read_table()`, where you put your file location \n",
"between brackets as before.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise E**: Writing files\n",
"\n",
"We can also write files directly from Python into a text, CSV, or Excel format. \n",
"To do that, we use `dataframe_name.to_*`, where `dataframe_name` is the name of \n",
"your DataFrame, and * indicates either a CSV or Excel and will determine what \n",
"type we write to. \n",
"\n",
"Use this to write the first two columns and 30 rows of the phosphorylation data \n",
"into a CSV file. Don't forget to specify the filepath when you save the file! \n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"`````"
]
},
{
"cell_type": "markdown",
"id": "f2f87daa",
"metadata": {},
"source": [
"`````{exercise} [**] Beavers\n",
":class: dropdown\n",
"{download}`Download this exercise.<../Coding_exercises/Week_4/Nina/03_Beavers_combine.zip>`\n",
"\n",
"Here we will work with a small part of a study of the long-term temperature \n",
"dynamics of the beaver *Castor canadensis* in north-central Wisconsin. Body \n",
"temperature was measured by telemetry every 10 minutes for four females, but \n",
"data from one period of less than a day for two animals is used here ([dataset source](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/beavers.html)).\n",
"\n",
"**Exercise A**: Explore the data\n",
"\n",
"What do the two provided files for beaver_1 and beaver_2 look like? Use the terminal\n",
"to explore them.\n",
"\n",
"**Exercise B**: Import\n",
"\n",
"Based on what you observed in Exercise A, decide on a Python package and use\n",
"it to import the data files (`beaver1.csv` and `beaver2.csv`) into Python.\n",
"\n",
"Is the same number of measurements available for both animals? If not, make a print\n",
"statement saying which beaver has more measurements and how many more.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"\n",
"**Exercise C**: Average temperature\n",
"\n",
"Which beaver had a higher average body temperature? Print the mean values for\n",
"both animals.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise D**: Combine\n",
"\n",
"Combine the data for two beavers into a single table and save it as an Excel\n",
"file. \n",
"\n",
"Which function is useful for concatenating the data?\n",
"Will you be able to distinguish which measurements came from which \n",
"beaver after you concatenate the data? If not, how can you go about this\n",
"problem?\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"`````"
]
},
{
"cell_type": "markdown",
"id": "923247f0",
"metadata": {},
"source": [
"`````{exercise} [**] Friends \n",
":class: dropdown\n",
"{download}`Download this exercise.<../Coding_exercises/Week_4/Verena/02_Pandas_df.py>`\n",
"\n",
"\n",
"pandas stores data in a 2D table called a DataFrame. \n",
"In this exercise we will learn how to work with such a data frame. \n",
"\n",
"**Exercise A**: Creating a DataFrame\n",
"\n",
"For our exercise purposes, we will create a simple DataFrame. \n",
"To do that, you first create a dictionary, with the column title corresponding \n",
"to the data for that column. \n",
"\n",
"For example, to get a column of ages and names, the command would be:\n",
"\n",
"```\n",
"data_friends = {\"Name\": [\"Alex\",\"Alin\", \"Lucia\", \"Tessa\"], \"Age\": [22,21,23,21]}\n",
"```\n",
"\n",
"We can convert this into a dataframe using:\n",
"\n",
"```\n",
"df_friends = pd.DataFrame(data).\n",
"```\n",
"\n",
"Similarly, create a DataFrame with the names, ages, gender and hair colour of \n",
"your friends and/or family members.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise B**: Creating a DataFrame II\n",
"\n",
"Another way to create a DataFrame is directly via the `pd.DataFrame` command. \n",
"For this, you must specify the data, column names and index labels (if present). \n",
"The index labels will replace 0,1,2,3 etc., as the labels for the rows. \n",
"\n",
"For example, to create the friends DataFrame we can write:\n",
"\n",
"```\n",
"df_friends = pd.DataFrame([[22,\"male\",\"brown\"],\n",
" [21,\"male\",\"brown\"],\n",
" [23,\"other\",\"brown\"],\n",
" [21,\"female\",\"brown\"]],\n",
" columns= [\"Age\",\"Gender\",\"hair colour\"],\n",
" index= [\"Alex\",\"Alin\",\"Lucia\",\"Tessa\"]) \n",
"```\n",
"\n",
"Note that in this case we don't need the column label \"names\" as the names \n",
"are now not a column, but the labels for the rows.\n",
"\n",
"Create a DataFrame with the same data as above directly using the DataFrame \n",
"command. For the exercises below, we will be using this DataFrame.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"\n",
"**Exercise C**: Retrieving specific data from DataFrames\n",
"\n",
"In DataFrames, it's possible to access only certain rows or columns of a data, \n",
"just as with matrices. \n",
"\n",
"To get a specific column, use `df_friends[\"gender\"]` or any other column name.\n",
"To access a row, we use the command `df_friends.loc[\"Lucia\"]` to access the row\n",
"with the label \"Lucia\".\n",
"\n",
"From DataFrame `df_friends`:\n",
"1. Create and print series (a 1D array in pandas) of the ages of your friends.\n",
"2. Create and print a series that gives all the data of one of your friends.\n",
"3. We can also use the command `df_friends.iloc[2]` to get all the information \n",
"about Lucia, where we use 2 to denote the row we want to access. \n",
"Try it yourself!\n",
"4. We can also get only the rows that correspond to a specific condition, as \n",
"seen in [the book](../chapter9/data-pandas.ipynb). Give all rows with male as gender.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"\n",
"**Exercise D**: Iterating over rows\n",
"\n",
"We can iterate over rows using `dataframe_name.iterrows()`. \n",
"Use the command: \n",
"\n",
"```\n",
"for i,j in df_friends.iterrows(): \n",
" print(i,j) \n",
"```\n",
" \n",
"What do `i` and `j` represent in this case? \n",
"What happens if you iterate only over one, not two variables, e.g., only `i`? \n",
"\n",
"Print the types for `i` and `j` in the first, and just `i` in the second case.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise E**: Iterating over columns\n",
"\n",
"To iterate over a column, we don't have a specific command. Rather, we first \n",
"extract the column names by converting the dataframe into a list. Check for \n",
"yourself that doing so will give you only the names of the columns!\n",
"\n",
"We can now iterate using column indexing (as seen earlier in this exercise) \n",
"and a `for` loop over the column names. Create a `for` loop that prints the \n",
"contents of each column `i`. \n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"**Exercise F**: Mathematical operations\n",
"\n",
"We can perform mathematical operations on DataFrames and Series similar to how \n",
"we work with matrices and arrays in NumPy. Take the DataFrame:\n",
"\n",
"```\n",
"df_abcd = pd.DataFrame([[2,6,3,6],[5,2,8,5],[3,1,7,5],[6,7,4,8]],\n",
" columns= [\"A\",\"B\",\"C\",\"D\"]) \n",
"```\n",
"\n",
"*Part I*. Perform the following operations:\n",
"1. Multiply by 4\n",
"2. Subtract 7\n",
"3. Multiply the dataframe by 2*pi and take the sine\n",
"4. Find the mean value of the columns of the DataFrame, and of the rows of the \n",
"DataFrame\n",
"\n",
"*Part II*. We can also perform operations with two Series. Next take the 0th and 1st row \n",
"of the DataFrame and multiply them. Then, subract the 0th row from the DataFrame.\n",
"\n",
"```\n",
"# Your code here\n",
"```\n",
"\n",
"`````"
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}