9.2. Files with NumPy#

Let’s explore NumPy functions for working with data files.

9.2.1. Loading, opening, and reading a file#

To load data from a text file into Python using NumPy, you can use its function np.loadtext(). To learn more about this function, use ?np.loadtxt or see documentation. Let’s try it out.

import numpy as np 

data = np.loadtxt("v_vs_time.dat")
print(data)

Here we loaded the data from the v_vs_time.dat file and printed it. While not obvious from the data itself, the file name suggests what this data is about: measurements of voltage as a function of time. In this case, the data is recorded in two columns of numbers in a text file, where numbers in two columns are separated by a space. The first number is the time of the measurement (in seconds) and the second is the measured voltage (in volts).

This is an example of a DSV (delimiter separated value) file - the delimiter (i.e., the thing that separates the values in two columns) is a space . Another common delimiter is a comma , and such files are CSV (comma separated value) files. CSV is a common “export” format from spreadsheets like Excel. A third common delimiter is Tab with TSV (tab separated value) files (also available as an export option from some spreadsheets). When values are separated by Tab characters, this shows up in Python strings as a special character \t.

The NumPy function np.loadtext() can handle any type of delimiter: the default is any whitespace (Tab or space), but this can also be changed using the delimiter= keyword argument of np.loadtext(). np.loadtext() also does not care about the file extension, e.g., it is fine if a CSV file does not have a .csv extension (it doesn’t even have to have an extension). Files containing ASCII text data often carry the extension .dat.

What if the file above was a comma separated file, and it also had a header describing what the columns actually are? How would we import it?

../../_images/csv-file.png

Fig. 9.2 v_vs_time file in CSV format and a header, as viewed in terminal with the less command.#

import numpy as np

# Set delimiter to comma ";" and skip first (header) row
data_2 = np.loadtxt("v_vs_time_2.csv", delimiter=",", skiprows=1)
print(data_2)

Here we saw how to “tweak” np.loadtxt() so that it works perfectly for our second file! As a general note: it’s very useful to explore which arguments a Python function can take.

In both cases above, we assigned the return value of np.loadtxt() to a variable (data and data_2), which are NumPy arrays (you can check this with the type command). When we printed our variables, we saw this:

[[ 0.00000000e+00 -4.88196842e-01]
 [ 1.00100100e-02  6.57403884e-01]
 [ 2.00200200e-02 -4.86876718e-01]
 ...
 [ 9.97997998e+00  2.11430345e+01]
 [ 9.98998999e+00  1.94693126e+01]
 [ 1.00000000e+01  1.82114232e+01]]

The dots ... suggest that Python ommitted some lines there in the middle (this happens when the data is large, otherwise it could completely clutter your screen). So how large is our array data? We can find out with np.shape():

print(np.shape(data))

Therefore, when np.loadtxt() loads a file, it returns a 2D NumPy array with shape (n,m), where n is the number lines in the file and m is the number of columns (here, we have 1000 rows and 2 columns).

As mentioned above, the first column represents the time in seconds when the measurement was taken, and the second column represents the measured voltage in volts. We will typically want to extract these into two vectors t and v. We can do this using slicing:

t = data[:,0]
v = data[:,1]

We can choose to look at the first ten points, and see that we have successfully loaded the data from the file and extracted it into vectors.

print('t = ', t[0:10])
print('v = ', v[0:10])

You should always double-check that the data was loaded correctly by comparing it with the file opened in terminal, Excel, or another application. It’s also possible to open the file in Python and print its contents line by line - let’s see how to do this for the first ten lines:

with open("v_vs_time.dat") as file:
    for i in range(10):
        print(file.readline())

9.2.2. Saving a file#

We can also save data using NumPy’s np.savetxt(). To do this, we first have to make sure that the data is a NumPy array of the correct shape. Let’s take a look at an example where we calculate the square of the measured voltage (the vector from above) and save this back into a new file:

v_squared = v**2

To “pack” the new vector v_squared and the unchanged time vector t together into a matrix like the one returned by np.loadtxt(), we can first create a 2D matrix of the correct size and then use slicing with an assignment operator = to give the columns the correct values:

# Create an empty array
matrix_to_save = np.zeros([len(v), 2])

# Fill columns with data
matrix_to_save[:,0] = t
matrix_to_save[:,1] = v_squared

Now we are ready to use np.savetxt(), which (unless specified otherwise) saves the file in the current directory:

np.savetxt("vsquare_vs_time.dat", matrix_to_save)

This creates a file in your workspace, which you can then open to see what’s inside it.

import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions6.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[10], line 1
----> 1 import micropip
      2 await micropip.install("jupyterquiz")
      3 from jupyterquiz import display_quiz

ModuleNotFoundError: No module named 'micropip'