'''
In this exercise we will start with importing data, either the whole or partial
file, from different file formats. For this, we will use the pandas package.

To start: Navigate to and open the data file in the terminal. What does it look like? 
What kind of delimiter do you see? Are there header rows?

Exercise A: CSV file

To import a CSV file, we will use pd.read_csv(r"\path\to\file\filename.csv"). 
Note that you may need to define the delimiter. Use AI to get an overview of 
the extra arguments this function accepts.

You received the file "uniprotkb_organism_id_9606_AND_reviewed_2024_06_19.csv" 
with this exercise. Read the file using pandas and find out in which format the 
data is stored using the `type` command. Then, print the data and analyse the 
output of the print command. 

What information on the data can you find? 
E.g., how many rows and columns does your data have?
'''

# Your code here


'''
Exercise B: Excel file 

Instead of a CSV file, you can also read Excel files using 
pd.read_excel(r"\path\to\file\filename.xlsx"). Try it! 
We've provided you with an Excel file carrying the same name as the above CSV.
'''

# Your code here


'''
Exercise C: Importing with parameters

pandas stores data in a so-called data frame, which is a 2D table with indexed 
rows and columns. When printing the data, you will find the row index to the left from 
the respective row (starting at 0,1,2,....). The top-most row gives the column 
names. For the imported data, from only looking at the dataframe information 
(not the original file), give the names of all the columns. 

# Solution: Entry, Reviewed, Entry name, Protein names, Gene Names, Organism, Length

There are a few important parameters to `pd.read_csv()` that you can play 
around with. A (non-exhaustive list) is:
- `sep`: The default separator between columns in CSV files is a column, but if 
you have a different delimeter, specify it using `sep =`.
- `header`: Gives the row that has the column names. The default is to use the 
0-th row as column names. If your data has no headers, use `header = None`. Use 
`header = 0` paired with `names = (you, column, names)` to manually override 
the header names.
- `usecols`: select the colums to be imported. You can both use the column 
numbers (e.g., `usecols = [1,2]`) or headers (e.g., `usecols=["Reviewed","Entry Name"]`).
- `nrows`: Select the number of rows you want to use.

For this exercise, import the first two columns and first 30 rows of the data.
Use CSV file as in exercise A.
'''

# Your code here

'''
Exercise D: Text files

Of course, Excel isn't the only data type you might need to read. Another 
common file type is a text file. For this, we will use the 
"Phosphorylation_Y.txt" file you received with this exercise. It describes 
tyrosine phosphorylation sites, including the UniProt IDs, tyrosine site, 
phosphorylated motif, and more. 

For importing, we will use `pd.read_table()`, where you put your file location 
between brackets as before. Does it accept the same arguments as `read_csv()` and 
`read_excel()`? You can also check with AI!
'''

# Your code here

'''
Exercise E: Writing files

We can also write files directly from Python into a text, CSV, or Excel format. 
To do that, we use `dataframe_name.to_*`, where `dataframe_name` is the name of 
your DataFrame, and * indicates either a CSV or Excel and will determine what 
type we write to. 

Use this to write the first two columns and 30 rows of the phosphorylation data 
into a CSV file. Don't forget to specify the filepath when you save the file! 
'''

# Your code here