{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Files with NumPy\n", "\n", "```{admonition} Interactive page\n", ":class: warning, dropdown\n", "This is an interactive book page. Press launch button at the top right side.\n", "```\n", "\n", "Let's explore NumPy functions for working with data files.\n", "\n", "## Loading, opening, and reading a file\n", "\n", "To load data from a text file into Python using NumPy, you can use its function `np.loadtext()`. To learn more about this function, use `?np.loadtxt` or see [documentation](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html). Let's try it out.\n", "\n", "```{admonition} File location and common bugs\n", ":class: tip, dropdown\n", "The code below works when a file (such as `v_vs_time.dat`) is located **in the same folder** as your Python script. If your file is somewhere else, e.g., one folder up, you have to use your knowledge of [navigating directories in Terminal](../chapter2/bash-commands.ipynb). You would then use `data = np.loadtxt(\"../v_vs_time.dat\")` where `../` indicates the location of your file. \n", "\n", "In general, if your **file won't load** into Python, often you're either looking for it in the wrong directory, or have a typo in either path or file name. For this, the terminal in VS Code can come handy with `pwd` and `ls` commands.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "import numpy as np \n", "\n", "data = np.loadtxt(\"v_vs_time.dat\")\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we loaded the data from the `v_vs_time.dat` file and printed it. While not obvious from the data itself, the file name suggests what this data is about: measurements of voltage as a function of time. In this case, the data is recorded in two columns of numbers in a text file, where numbers in two columns are separated by a space. The first number is the time of the measurement (in seconds) and the second is the measured voltage (in volts).\n", "\n", "This is an example of a **DSV** (delimiter separated value) file - the **delimiter** (i.e., the thing that separates the values in two columns) is a space ` `. Another common delimiter is a comma `,` and such files are **CSV** (comma separated value) files. CSV is a common \"export\" format from spreadsheets like Excel. A third common delimiter is `Tab` with **TSV** (tab separated value) files (also available as an export option from some spreadsheets). When values are separated by `Tab` characters, this shows up in Python strings as a special character `\\t`. \n", "\n", "````{admonition} New line character\n", ":class: note, dropdown\n", "Similar to `\\t` for `Tab`, Python has a new line character `\\n`. For instance, if you use this `print` statement: \n", "```\n", "print(\"Hello world! \\nGoodbye!\")\n", "```\n", "it prints in two lines: \n", "```\n", "Hello world!\n", "Goodbye!\n", "```\n", "````\n", "\n", "The NumPy function `np.loadtext()` can handle any type of delimiter: the default is any whitespace (`Tab` or space), but this can also be changed using the `delimiter=` keyword argument of `np.loadtext()`. `np.loadtext()` also does not care about the file extension, e.g., it is fine if a CSV file does not have a `.csv` extension (it doesn't even have to have an extension). Files containing [ASCII](https://en.wikipedia.org/wiki/ASCII) text data often carry the extension `.dat`.\n", "\n", "What if the file above was a comma separated file, and it also had a header describing what the columns actually are? How would we import it?\n", "\n", "```{admonition} A glance into a file\n", ":class: tip, dropdown\n", "When working with a new text file, how do you check what's the delimiter, and what the file looks like in general? To get a quick glance at the file's content: open your terminal, navigate to the directory with the file, and then read the file using `less file_name`. This can help you determine how to use `np.loadtext()` to successfully load your file.\n", "```\n", "\n", "```{figure} ../images/chapter9/csv-file.png\n", "---\n", "height: 300px\n", "name: csv-file\n", "---\n", "`v_vs_time` file in CSV format and a header, as viewed in terminal with the `less` command.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "import numpy as np\n", "\n", "# Set delimiter to comma \";\" and skip first (header) row\n", "data_2 = np.loadtxt(\"v_vs_time_2.csv\", delimiter=\",\", skiprows=1)\n", "print(data_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we saw how to \"tweak\" `np.loadtxt()` so that it works perfectly for our second file! As a general note: it's very useful to **explore which arguments a Python function can take**.\n", "\n", "In both cases above, we assigned the return value of `np.loadtxt()` to a variable (`data` and `data_2`), which are NumPy arrays (you can check this with the `type` command). When we printed our variables, we saw this:\n", "\n", "```\n", "[[ 0.00000000e+00 -4.88196842e-01]\n", " [ 1.00100100e-02 6.57403884e-01]\n", " [ 2.00200200e-02 -4.86876718e-01]\n", " ...\n", " [ 9.97997998e+00 2.11430345e+01]\n", " [ 9.98998999e+00 1.94693126e+01]\n", " [ 1.00000000e+01 1.82114232e+01]]\n", " ```\n", " \n", " The dots `...` suggest that Python ommitted some lines there in the middle (this happens when the data is large, otherwise it could completely clutter your screen). So how large is our array `data`? We can find out with `np.shape()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "print(np.shape(data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore, when `np.loadtxt()` loads a file, it returns a 2D NumPy array with shape `(n,m)`, where `n` is the number lines in the file and `m` is the number of columns (here, we have 1000 rows and 2 columns). \n", "\n", "As mentioned above, the first column represents the time in seconds when the measurement was taken, and the second column represents the measured voltage in volts. We will typically want to extract these into two vectors `t` and `v`. We can do this using slicing:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "t = data[:,0]\n", "v = data[:,1]" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-c63f7111a8fd6835", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "We can choose to look at the first ten points, and see that we have successfully loaded the data from the file and extracted it into vectors." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "remove-output" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "t = [0. 0.01001001 0.02002002 0.03003003 0.04004004 0.05005005\n", " 0.06006006 0.07007007 0.08008008 0.09009009]\n", "v = [-0.48819684 0.65740388 -0.48687672 -0.85813524 1.60596293 -0.54581193\n", " -1.91012194 -0.82136067 -1.72498239 1.50810942]\n" ] } ], "source": [ "print('t = ', t[0:10])\n", "print('v = ', v[0:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should always double-check that the data was **loaded correctly** by comparing it with the file opened in terminal, Excel, or another application. It's also possible to open the file in Python and print its contents line by line - let's see how to do this for the first ten lines:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "with open(\"v_vs_time.dat\") as file:\n", " for i in range(10):\n", " print(file.readline())" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-0793eb4c8f85801b", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## Saving a file\n", "\n", "We can also save data using NumPy's `np.savetxt()`. To do this, we first have to make sure that the data is a NumPy array of the correct shape. \n", "Let's take a look at an example where we calculate the square of the measured voltage (the vector from above) and save this back into a new file:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "v_squared = v**2" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-fb66afd2a19403c5", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "To \"pack\" the new vector `v_squared` and the unchanged time vector `t` together into a matrix like the one returned by `np.loadtxt()`, we can first create a 2D matrix of the correct size and then use slicing with an assignment operator `=` to give the columns the correct values:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# Create an empty array\n", "matrix_to_save = np.zeros([len(v), 2])\n", "\n", "# Fill columns with data\n", "matrix_to_save[:,0] = t\n", "matrix_to_save[:,1] = v_squared" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-7823cff287a7691f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now we are ready to use `np.savetxt()`, which (unless specified otherwise) saves the file in the current directory: " ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "np.savetxt(\"vsquare_vs_time.dat\", matrix_to_save)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This creates a file in your workspace, which you can then open to see what's inside it. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "thebe-remove-input-init" ] }, "outputs": [], "source": [ "import micropip\n", "await micropip.install(\"jupyterquiz\")\n", "from jupyterquiz import display_quiz\n", "import json\n", "\n", "with open(\"questions6.json\", \"r\") as file:\n", " questions=json.load(file)\n", " \n", "display_quiz(questions, border_radius=0)" ] } ], "metadata": { "celltoolbar": "Create Assignment", "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "toc": { "base_numbering": "5", "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "249.797px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }