Skip to article frontmatterSkip to article content

in this TP we work on

here’s an example of the outputs we will obtain

imports

import pandas as pd
import matplotlib.pyplot as plt
  1. make sure to use matplotlib in interactive mode - aka ipympl

# your code
  1. optional: setup itables, so that we can have scrollable tables

# your code

the data

we have a table of events, each with a begin (beg) and end time; in addition each is attached to a country

leases = pd.read_csv("data/leases.csv")
leases.head(10)
Loading...

adapt the type of each columns

surely the columns dtypes need some care

# your code
# check it

leases.dtypes
beg object end object country object dtype: object

raincheck

check that the data is well-formed, i.e. the end timestamp happens after beg

# your code

are there any overlapping events ?

it turns out there are no event overlap, but write a code that checks that this is true

# your code

timespan

What is the timespan covered by the dataset (earliest and latest events, and duration in-between) ?

# your code

aggregated duration

so, given that there is no overlap, we can assume this corresponds to “reservations” attached to a unique resource (hence the term lease)
write a code that computes the overall reservation time, as well as the average usage ratio over the overall timespan

# your code

visualization - grouping by period

usage by period

grouping by periods: by week, by month or by year, display the total usage in that period
(when ambiguous, use the beg column to determine if a lease is in a period or the other)

# your code

improve the title and bottom ticks

add a title to your visualisations

also, and particularly relevant in the case of the per-week visu, we don’t get to read the labels on the horizontal axis, because there are too many of them
to improve this, you can use matplotlib’s set_xticks() function; you can either figure out by yourself, or read the few tips below

# let's say as arule of thumb
LEGEND = {
    'W': "week",
    'M': "month",
    'Y': "year",
}

SPACES = {
    'W': 12,   # in the per-week visu, show one tick every 12 - so about one every 3 months
    'M': 3,    # one every 3 months
    'Y': 1,    # on all years
}
# your code

a function to convert to hours

you are to write a function that converts a pd.Timedelta into a number of hours

  1. read and understand the test code for the details of what is expected

  2. use it to test your own implementation

# your code

def convert_timedelta_to_hours(timedelta: pd.Timedelta) -> int:
    pass
# test it

# if an hour has started even by one second, it is counted
test_cases = ( 
    # input in seconds, expected result in hours
    (0, 0), 
    (1, 1),     (3599, 1),     (3600, 1), 
    (3601, 2),  (7199, 2),     (7200, 2), 
    # 2 hours + 1s -> 3 hours
    (7201, 3),  
    # 3 hours + 2 minutes -> 4 hours
    (pd.Timedelta(3, 'h') + pd.Timedelta(2, 'm'), 4),
    # 2 days -> 48 hours
    (pd.Timedelta(2, 'D'), 48),
)

def test_convert_timedelta_to_hours():
    for seconds, exp in test_cases:
        # convert into pd.Timedelta if not already one
        if not isinstance(seconds, pd.Timedelta):
            timedelta = pd.Timedelta(seconds=seconds)
        else:
            timedelta = seconds
        # compute and compare
        got = convert_timedelta_to_hours(timedelta)
        print(f"with {timedelta=} we get {got} and expected {exp} -> {got == exp}")

test_convert_timedelta_to_hours()
with timedelta=Timedelta('0 days 00:00:00') we get None and expected 0 -> False
with timedelta=Timedelta('0 days 00:00:01') we get None and expected 1 -> False
with timedelta=Timedelta('0 days 00:59:59') we get None and expected 1 -> False
with timedelta=Timedelta('0 days 01:00:00') we get None and expected 1 -> False
with timedelta=Timedelta('0 days 01:00:01') we get None and expected 2 -> False
with timedelta=Timedelta('0 days 01:59:59') we get None and expected 2 -> False
with timedelta=Timedelta('0 days 02:00:00') we get None and expected 2 -> False
with timedelta=Timedelta('0 days 02:00:01') we get None and expected 3 -> False
with timedelta=Timedelta('0 days 03:02:00') we get None and expected 4 -> False
with timedelta=Timedelta('2 days 00:00:00') we get None and expected 48 -> False
# for debugging; this should return 48

convert_timedelta_to_hours(pd.Timedelta(2, 'D'))

use it to display totals in hours

keep the same visu, but display the Y axis in hours
btw, what was the unit in the graphs above ?

# your code

grouping by period and region

the following table allows you to map each country into a region

# load it

countries = pd.read_csv("data/countries.csv")
countries.head(3)
Loading...

a glimpse on regions

what’s the most effective way to see how many regions and how many countries per region ?

# your code

attach a region to each lease

your mission is to now show the same graphs, but we want to reflect the relative usage of each region, so we want to split each bar into several colors, one per region see expected result below

most likely your first move is to tag all leases with a region column

# your code

visu by period by region

you can now produce the target figures, again they look like this

# your code