Skip to article frontmatterSkip to article content
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data loading

Let’s load a dataset on rain precipitations on Seattle on 2014

# we download the file from Internet and save it
# easiest way, we can pass a URL to read_csv (or a local file)
URL = "http://www-sop.inria.fr/members/Arnaud.Legout/formationPython/Exos/Seattle2014.csv"

# don't worry, we will come back to this line when we will talk about pandas.
# for now it just load a ndarray
rainfall = pd.read_csv(URL)["PRCP"].to_numpy()

# other solution to get the remote file with urllib
# from urllib.request import urlopen
# with open("Seattle2014.csv", "w", encoding='utf-8') as f:
#    with urlopen(URL) as u:
#        f.write(u.read().decode('utf-8'))

# we extract with pandas the precipitation column
# rainfall is an array of precipitation per day 
# for each day of 2014
# rainfall = pd.read_csv('Seattle2014.csv')['PRCP'].to_numpy()

Let’s visualize

[assignement]: plot the amount of rain (in mm) over time; make sure you put a proper label on both axes, and on the global figure

# your code here

Let’s answer the following questions

What is the shape and dype of the ndarray?

# your code here

How many rainy days?

# your code here

Average precipitation on the year?

# your code here

Average precipitation on the rainy days?

# your code here

Mean precipitation on January?

# your code here

Mean precipitation on January on the rainy days?

# your code here

A transition to pandas

# But in practice we don’t do that. Here is what we do…
# We start to convert to a pandas Series
s = pd.Series(rainfall)

# then we convert the index to the real dates
s.index = pd.to_datetime(s.index, unit='D',
                         origin=pd.Timestamp('1/1/2004'))

# possibly resample per month to get the total monthly rain
s = s.resample('m').max()
/tmp/ipykernel_2644/2088845755.py:10: FutureWarning: 'm' is deprecated and will be removed in a future version, please use 'ME' instead.
  s = s.resample('m').max()
# then plot

%matplotlib ipympl

s.plot.bar()
plt.xlabel('month')
plt.ylabel('mm')
plt.title('Rainy days in 2014 at Seattle')
fig = plt.gcf()
fig.autofmt_xdate()
# plt.show() # if in a terminal
Loading...