import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsthe data¶
Here is a simple dataset of three tables.
import pandas as pd
wifi_table = pd.DataFrame(
{
"SSID_ID": [1, 2, 2, 5, 3, 1],
"BSSID_ID": [1, 2, 2, 5, 3, 1],
"DBM": [-34, -36, -80, -60, -55, -48],
},
index=pd.date_range("3 june 2021", periods=6),
)
ssid_table = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5, 6],
"SSID": ["box bill", "sam", "orange1884", "FreeWifi", "sfr43", "Eduroam"],
}
)
bssid_table = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5, 6],
"BSSID": [
"10:10:10:10:20",
"30:30:20:11:15",
"18:34:26:45:12",
"11:54:65:55:23",
"45:43:22:43:54",
"44:35:33:22:11",
],
}
)
print(f"**wifi_table**\n{wifi_table}\n**ssid_table**\n{ssid_table}\n**bssid_table**\n{bssid_table}")**wifi_table**
SSID_ID BSSID_ID DBM
2021-06-03 1 1 -34
2021-06-04 2 2 -36
2021-06-05 2 2 -80
2021-06-06 5 5 -60
2021-06-07 3 3 -55
2021-06-08 1 1 -48
**ssid_table**
ID SSID
0 1 box bill
1 2 sam
2 3 orange1884
3 4 FreeWifi
4 5 sfr43
5 6 Eduroam
**bssid_table**
ID BSSID
0 1 10:10:10:10:20
1 2 30:30:20:11:15
2 3 18:34:26:45:12
3 4 11:54:65:55:23
4 5 45:43:22:43:54
5 6 44:35:33:22:11
merge¶
Let’s start with a simple merge of wifi_table with ssid_table (on SSID_ID and ID), and with bssid_table (on BSSID_ID and ID). You must keep the index of wifi_table.
# your codeanother dataset¶
Here is a new dataset
df1 = pd.DataFrame({'name': ['Bob', 'Lisa', 'Sue'],
'pulse': [70, 63, 81]},
index=[123, 354, 165])
df2 = pd.DataFrame({'name': ['Eric', 'Bob', 'Marc'],
'weight': [60, 100, 70]},
index=[654, 123, 664])merge¶
Let’s outer merge the tables df1 and df2 on name and preserving both indexes. Name the index of df1 ID_old and the index of df2 ID_new.
# your codeNow, consider that ID_new is the new person ID that must be kept, but that for some person, the new ID is missing. In that case, this missing ID must be replaced with the old person ID ID_old. Finally, set the new person ID as the index.
Hint: consider combine_first
# your code