Data types

4.1. Data types#

Interactive page

This is an interactive book page. Press the launch button at the top right side.

In Python, we can assign values to variables. Variables have a property that is called their type. In the previous chapter, you’ve seen a few lines of Python code in which we assigned a value (either integer or string) to a variable using the assignment operator = (see previous chapter), and Python automatically picked a variable type it thinks fits best. You can use type() function to check the data type - try it out for a and b:

a = 1
b = "Python is cool"

print(type(a))
print(type(b))

Function inside a function

In the code above, we’ve seen a line like this: print(type(a)). In Python, it’s possible to “engulf” a function within another function; in this case, type() is within print(). Effectively, this means that we will “print the type of a”. In other words, the function on the inside is the first one to have an effect on variable a.

4.1.1. Built-in data types#

As you can imagine, there are many more data types in Python besides integer and string. Before we delve into some of these types in detail, here we offer a general overview.

Table 4.1 Python built-in data types.#
	Data type	Example	Usage
Text type	`str`	`x = "Hello world"`	Textual data
Numeric types	`int`	`x = -10`	Integer numbers; counting; indexing
	`float`	`x = 2.5`	Rational numbers
	`complex`	`x = 2 + 5j`	Irrational numbers
Boolean type	`bool`	`x = True`	Represent truth values; indicate binary states; conditional statements
Sequence types	`list`	`x = ["math", "physics", "biology"]`	Ordered, mutable collections of items
	`tuple`	`x = ("math", "physics", "biology")`	Ordered, immutable collections of items
	`range`	`x = range(5)`	Generating a sequence of numbers; in `for` loops
Mapping type	`dict`	`x = {"name": "Johan", "grade": 9.5}`	Storing and retrieving data based on unique keys
Set types	`set`	`x = {"math", "physics", "biology"}`	Unordered collections of unique items; removing duplicates
	`frozenset`	`x = frozenset({"math", "physics", "biology"})`	Unordered collections of unique items that should be immutable
None type	`NoneType`	`x = None`	Absence of a value or a null value

As you may have noticed, each data type has specific use cases. Depending on what your code is doing, you will choose which types are appropriate for your variables. We will next expolore some of these types in more detail; others will be discussed later.

4.1.1.1. Text type#

We’ve already encountered the “string” variable type str, which is used for pieces of text. To tell Python you want to make a string, you enclose the text of your string in either single forward quotes ' or double forward quotes ":

c = "This is a string"
d = 'This is also a string'

print(type(c))
print(type(d))

You can also make multiline strings using three single quotes:

multi = \
'''
This string
has 
multiple lines.
'''

print(multi)

Note that we used a backslash, which is a way to split Python code across multiple lines.

Although it’s not obvious, Python can also do “operations” on strings: the + mathematical operators we saw above also work with strings.

Exercise 4.1

Discover what the + operator does to a string: print the output of the sum of two strings.

# your code here

4.1.1.2. Numeric types#

We’ve seen that variables with integer numbers get assigned the int type. If we assign a non-integer number to a variable, Python has to choose another type instead of int. For variable f in the code below, Python will select the type float, which corresponds to floating point numbers.

f = 2.5
print(type(f))

In general, Python tries to choose a variable type that makes calculations the fastest and uses as little memory as possible.

If you assign a new value to a variable, it can change the variable type:

f = 5
print(type(f))

Observe what happens in the following case:

f = 5
print(type(f))

f = f/2
print(type(f))

What happened in this code? In the first line, we assigned the value of 5 to variable f using the assignment operator =. In the second line, f/2 indicated division of the current value of f (which is 5) by number 2. The result of this multiplication is 2.5. Then we use assignment operator = to assign the value of 2.5 to variable f. In essence, the second line of the code updates variable f to be two times smaller than its previous value.

In this example, because 5/2 = 2.5, Python decided to change the type of variable f from int to float after the assignment operation f = f/2.

When you are using floating point numbers, you can also use “exponential” notation to specify very big or very small numbers:

f = 1.5e-8

The notation 1.5e-8 is a notation used in Python to indicate the number \(1.5 \times 10^{-8}\).

A third type of mathematical variable type that you may use in nanobiology is a complex number. In Python, you can indicate a complex number by using j, which is the Python notation for the complex number \(i\):

d = 1+2j
print(type(d))

The notation j is special, in particular because there is no space between the number preceding it (in this case 2) and j. This is how Python knows that you are telling it to make a complex number (and not just referring to a variable named j). The number in front of the j can be any floating point number:

1 + 0.5j

4.1.1.3. Boolean type#

“Boolean” type bool is another very useful type of variable. Boolean variables can have two values: True and False. You type them in directly as True and False with no quotes.

g = False
print(type(g))

We will use Boolean types extensively later when we’ll look at the flow control, but a simple example using the if statement is given below. Don’t panic if you don’t understand the if statement yet, there will be another entire section dedicated to it. At this point, this is just an example of why Boolean variables exist.

g = True

if True:
    print("True is always true.")

if g:
    print("g is true!")
    
if not g:
    print("g is not true!")

You can try changing the value of g above to False and see what happens if you run the above code cell again.

Numbers and True/False statements

Numbers (both int and float) can also be used in True / False statements. Python will interpret any number that is not zero as True and any number that is zero as False. Similar rules hold true for strings: an empty string ("") is interpreted as False, while a non-empty one is True.

Exercise 4.2

Discover which numbers can be used as True and False in Python by changing the value of g above and re-running the code. You can also try defining g as different strings and see what happens.

4.1.1.4. Sequence types#

It often happens that you want to store data that belong together (a collection of items). There are several options to do this. Here we will focus on lists and tuples. Later we will also see NumPy arrays, which are not a Python built-in type, but are extremely useful to scientists.

Think of storing personal data, where we need to know first name, family name, address, and city, in a list:

Person_1 = ['Jana', 'Bakker', 'Lorentzweg', 1, 'Delft']
print(Person_1)

# Person_1 type
print(type(Person_1)) 

# Type for first and third element
print(type(Person_1[0]), type(Person_1[3]))

It is interesting to see that Person_1 is a list and that within the list other types exist.

Note

We can access elements in a list using square brackets [ ]. The first item is referred to by 0 because Python starts to count from 0. Using a numerical index to access an elements from a list is called indexing.

If we make a mistake, we can replace an item in the list.

Person_1 = ['Jana', 'Bakker', 'Lorentzweg', 1, 'Delft']
Person_1[0] = 'Anna'
print(Person_1)

Another way we can store data is by using a tuple. Note that the only difference in notation, compared to a list, is the use of parentheses.

Person_2 = ('Johan', 'Vos', 'Mekelweg', 5, 'Delft')
print(Person_2)
print(type(Person_2), type(Person_2[0]), type(Person_2[3]))

What is the difference between a list and a tuple? The most important difference is that tuples are immutable, i.e., you cannot change them as we could a list:

Person_2 = ('Johan', 'Vos', 'Mekelweg', 5, 'Delft')
Person_2[1] = 'Jansen'

A peculiar case of a = b

Let’s look at a peculiar example in the code cell below. Go to the code cell and excecute it. What do you observe?

In this example, something strange happens. We did not change a, did we?

A “funny” thing happens with the command b = a. As we say that b and a are the same, rather than creating a new spot in the memory where the information is stored, b and a obtain the same memory address. If we call upon a, Python searches its memory, and obtains the data stored at that unique ID. We can see this using id() function (try it out in the code cell above). This is important to know because if we say b = and we change the value of b, the value of a changes as well! Therefore, in this example, b is not a copy of a with a new location in the memory. Instead, b obtains the same location in the memory.

This peculiarity holds true for mutable objects in Python, therefore:

Mutable objects: content can be changed after creation (e.g., list, dict, set).
Immutable objects: content cannot be changed (e.g., int, float, str, tuple).

# A peculiar case of a = b

a = [2, 3, 4]
b = a
b[0] = -10
print(a)

We can also make tuples with only a single item stored. However, we must use a comma; otherwise, Python will not recognize it as a tuple.

# Not a tuple
n_a_t = (1)
print(type(n_a_t))

# A tuple
a_t = (1,)
print(type(a_t))

Lists are mutable and are thus called variables. However, a tuple cannot be varied and is thus not a variable - it’s called an object. Since it’s immutable, a tuple requires less space. We can still make effective use of tuples (and lists):

a = [2,3,5]
b = (2,3,5)
print(a[0]*2)
print(b[0]*2)

import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions4.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)

4.1.1.5. Mapping type#

The last type of data we’ll take a closer look at now is the mapping type called dictionary. Python dictionaries store data in a key:value pairs, where values can be of any type, and each value is associated with a unique key. A value in a dictionary can then quickly be accessed using the corresponding key name.

d = {
    "name": "hemoglobin",
    "organism": "human",
    "length": 142
}

print(d)
print(type(d))

# Access the value associated with "length" key
print(d["length"])

Here we have dictionary items of string (“hemoglobin”, “human”) and integer (142) types. In principle, dictionary items can be of any type:

d = {
    "name": "hemoglobin",
    "organism": "human",
    "length": 142,
    "multimeric": True,
    "subunits": ["alpha", "beta"]
}

print(d)

We mentioned the requirement for the dictionary keys to be unique. Otherwise, how would Python know which dictionary item you’re referring to? Let’s see what happens if you introduce duplicates into a dictionary:

d = {
    "name": "hemoglobin",
    "organism": "human",
    "length": 142,
    "length": 95
}

print(d)

If you try to introduce duplicate keys, the last one will overwrite the previous ones.

Unlike sets or tuples, dictionaries are changeable - we can change, remove, or add items after a dictionary has been created. Look at the code below. Before running it, try to predict what the outcome dictionary will be.

d = {
    "name": "hemoglobin",
    "organism": "human",
    "length": 142
}

# Changing an item
d["length"] = 151

# Adding an item
d["multimeric"] = True

# Removing an item using pop()
d.pop("organism")

print(d)

If we want to quickly assess how many items a dictionary has, we can use the len() function:

d = {
    "name": "hemoglobin",
    "organism": "human",
    "length": 142,
    "multimeric": True,
    "subunits": ["alpha", "beta"]
}

print(len(d))

You will see many applications of dictionaries in bioinformatics. For instance, you can imagine storing sequences with their IDs as keys and sequences as values:

sequences_dictionary = {
    "seq_ID_1": "GTCCAGTGAC",
    "seq_ID_2": "TGGTACCGTA",
    "seq_ID_3": "TGCCGATAGG"
}

or storing information about genomic regions with chromosome names/regions as keys and coordinates as values:

genomic_coordinates_dictionary = {
    "chr1:1000-2000": {"start": 1000, "end": 2000, "gene": "geneA"},
    "chr2:3000-4000": {"start": 3000, "end": 4000, "gene": "geneB"}
}

import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions5.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)

4.1.1.6. Set type#

In addition to lists, tuples and dictionaries, set is another built-in data type used to store collections of data. A set is an unordered collection of unique items; in other words, duplicate items are not allowed in a set. While items in a set are unchangeable, it is possible to remove and add items.

Sets are written with curly brackets in Python, like this:

my_set = {"dna", "rna", "protein"}

Try it out:

my_set = {"dna", "rna", "protein"}
print(my_set)

What happens if you have a duplicate value? Try adding another “dna” item in the set above and run the code.

It is often useful to know how many items are in the set. For this, you can use len(). Notice that one set may contain multiple data types.

my_set = {"dna", "rna", "protein", False, 46, "lipid", "carbohydrate", True}
print(len(my_set))

4.1.2. Converting types#

We can also convert a variable value from one type to another by using functions. These functions carry names of types that we want to convert to. See these examples:

float(5)

int(7.63)

Note that when converting a float to an int, Python does not round off the value, but instead drops all the numbers after the decimal point (it “truncates” the floating point number). If we want to convert to an integer and round the value, we can use the round() function:

b = round(7.63)
print(b)

print(type(b))
print(b+0.4)

This works for conversions between many types. Sometimes, you will lose information in this process, for example, by converting a float to an int, we lose all the numbers after the decimal point.

Sometimes, Python can’t decide what to do, and so it triggers an error:

float(1+1j)

A very useful feature is that Python can convert numbers into strings:

a = 7.54
str(a)
b = a + 1
print(b)

# Hint: how can you explain the type of a?
print(type(a))
print(type(b))

In this code, you might expect an issue since you make a string out of a, however, you do not store it in a variable. What would happen if you had this line instead: a = str(a)?

When you use the print() command with a numeric value, Python converts it into a string (as the output of print() is a printed text). The other way around is also true - as long as your string is easily convertible to a number, Python can do this for you too.

# Note: the quotation marks make 5.74 a string.
float('5.74')

int('774')

complex('5+3j')

We can also convert immutable tuples to mutable lists and back again:

a = (4,5)
print(type(a))

a = list(a)
print(type(a))

a = tuple(a)
print(type(a))

We can also make several type changes in one go. For instance, the following code will first convert floating point number 5.1 into integer using the inner function int(), and then convert integer into string with the outer function str(). The outcome will, therefore, be string.

a = str(int(5.1))
print(a)
print(type(a))

Exercise 4.3

Define a list of variables, each of a different type. You can name them a, b, etc. Define variables of as many different types as possible, i.e., all the examples from above and maybe a few more. Try to change their type, and then run the type() function to see what the effect is.

Were all conversions possible?
What did you observe in converting bool into int, and the other way around?

# Your code here

Exercise 4.4

Define a list that contains some duplicate elements. Try to convert it to a set using set(). What does your set look like compared to the list? Use the cell below to write your code.

# Your code here

import micropip
await micropip.install("jupyterquiz")
from jupyterquiz import display_quiz
import json

with open("questions2.json", "r") as file:
    questions=json.load(file)
    
display_quiz(questions, border_radius=0)

4.1.3. Names of variables#

So far we have not talked about the names of variables. On this page, we used names such as a and b for variables. Each time, the previous value of the variable carrying the same name was overwritten. Furthermore, what does a refer to? Is it acceleration, a coefficient?

If you want to be able to read your code next year, or if you want to be able to read your code, you have to come up with proper names for your variables. We discuss variable names and other tips and conventions in the next chapter (good coding practices and PEP 8 style guidelines for Python).

4.1.4. Tab completion#

Computer programmers often forget things, and often they forget variables they have defined. Also, programmers like to save typing whenever they can.

For this reason, you can use the feature called Tab completion. Recall that the Tab completion also workes in the terminal.

The idea is that if you start typing part of the name of a variable or part of the name of a function, and then push the Tab key, VS Code will bring up a list of the variable and function names that match what you have started to type. If ony one matches, it will automatically type the rest for you. If multiple things match, it will offer you a list: you can either keep typing until you type something that is unique and then press Tab again, or you can use the cursor keys to select what you want.

Try this in VS Code.