Skip to article frontmatterSkip to article content
import pandas as pd
import matplotlib.pyplot as plt

import geopandas as gpd

# activate this if running under jlab
# %matplotlib ipympl

presidential averages

538.com - actually it is http://fivethirtyeight.com - is a site hosted by ABC news, that exposes data about the US presidential election

we’re gonna use the data in this URL:

URL = "https://projects.fivethirtyeight.com/polls/data/presidential_general_averages.csv"
# your code

CACHE = "data/DATA.csv"

And what we want to do is to plot the average of the polls for each candidate.
In other words, you should obtain something like this - we will arbitrarily focus on the 2024 year only

# your code

using the interactive view (after all we are using %matplotlib ipympl), zoom into the figure and retrieve

also write a line of code to compute this second date

# your code

first_harris_date = ...

race end

from this part we will focus on the period after first_harris_date

# your code

how many candidates are still in the data ?
make sure to keep only the 2 most famous ones

# your code

geographic rendering

in this section we will produce a summary map, which looks like this
the color depicts the ratio between, otoh Harris’s average score over time, and otoh Trump’s
also the tooltips allow to expose more details on the individual results

first we need a definition of the various US states; there is one here

# no longer easily readable by geopandas because of an SSL certificate issue
US_STATES_SHAPEFILE_URL = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_20m.zip"

US_STATES_SHAPEFILE_CACHE = "data/us-states.zip"

and we can load it like so:

# so instead of doing this
# gdf = gpd.read_file(US_STATES_SHAPEFILE_URL)

# we'll do this
gdf = gpd.read_file(US_STATES_SHAPEFILE_CACHE)

# and we get this
gdf.head()
Loading...

so as you can see this is almost like a regular dataframe, except for the geometry column; which is a geographic entity, hence the term geo-dataframe

digression 2: using altair to produce a geographic visualization

of course you might have to install altair... how do you go about doing that again ?

import altair as alt

# this is for rendering altair charts within the notebook
alt.renderers.enable("html")
RendererRegistry.enable('html')
# to show a geographic map from that geo-df

alt.Chart(gdf).mark_geoshape()
Loading...
# or we can also use it like this if we prefer

chart = (
    alt.Chart(gdf)
    .mark_geoshape()
)

chart.display()
Loading...

now in terms of presentation, it is a little suboptimal, let’s improve this a bit

(
    alt.Chart(gdf)
    .mark_geoshape()
    .properties(width=800)
    .project('albersUsa')
)
Loading...

now, the initial geo-dataframe has some numeric values, that we can use to color the map !

for example, there are AWATER and ALAND - that I take it mean area of water and area of land respectively
and we can use one of these to color the different states

for that we just do, like for simpler altair plots we call encode() like so

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color="ALAND:Q",          # Q stands for quantitative
    )
    .properties(width=800)
    .project('albersUsa')
)
Loading...
# or if we prefer, same result essentially
# but this way we can be more descriptive

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color=alt.Color(field="ALAND", type="quantitative", title="land area")
    )
    .properties(width=800)
    .project('albersUsa')
)
Loading...
# also useful with altair, we can give a `tooltip` parameter to encode
# and this shows when your mouse hovers on a state

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color=alt.Color(field="AWATER", type="quantitative", title="water area"),
        # and we can show there anything from the table
        tooltip=["NAME", "ALAND"],
    )
    .properties(width=800)
    .project('albersUsa')
)
Loading...

back to our data

given this knowledge, you should be able to produce our target graph, namely again

# your code

focusing on swing states

from the graph above, keep only the following states

SWING_STATES = [
    'Nevada',
    'Arizona',
    'Wisconsin',
    'Michigan',
    'Pennsylvania',
    'Georgia',    
    'North Carolina',
]
# your code