Week 7 - Other libraries and cool things

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

seaborn

seaborn is a Python data visualisation library for making statistical graphics. It is built on top of matplotlib and integrates very closely with pandas.

Exploratory visualisations are often much easier with seaborn. For example, with only a few lines of code, we can visualise 5 columns from the titanic dataset.

[2]:
sns.set_theme(context='notebook', style='darkgrid')

df = pd.read_csv('../data/titanic.csv')

ax = sns.relplot(
    data=df,
    x='Age', y='Fare', col='Embarked',
    hue='Sex',
    style='Survived',
    markers={0: 'X', 1: 'o'},
);
_images/07_other_libraries_3_0.png

Matplotlib figure anatomy

A matplotlib figure is a collection of Artist objects stored together in a logical parent-child hierarchy. Here’s a neat way to visualise it.

[3]:
from matplotlib.artist import Artist

# Make a basic example figure
fig, ax = plt.subplots(figsize=(6, 6))
ax.plot(range(100), range(100), label='A diaganol line')
ax.set(
    xlabel='The x-axis',
    ylabel='The y-axis',
    title='Example figure'
)
ax.legend()
ax.annotate(
    text='This is the halfway point',
    xy=(50, 50),
    xytext=(20, 80),
    arrowprops={'width':1, 'facecolor':'k', 'edgecolor':'k'}
)

# A function to plot all of the Artists
def recursive_get_children(artist, depth=0):
    if isinstance(artist, Artist):
        print('  ' * depth + str(artist))
        for child in artist.get_children():
            recursive_get_children(child, depth + 2)

# Call the function on our figure
recursive_get_children(fig)
Figure(600x600)
    Rectangle(xy=(0, 0), width=1, height=1, angle=0)
    AxesSubplot(0.125,0.11;0.775x0.77)
        Line2D(A diaganol line)
        Annotation(50, 50, 'This is the halfway point')
        Spine
        Spine
        Spine
        Spine
        XAxis(75.0,65.99999999999999)
            Text(0.5, 0, 'The x-axis')
            Text(1, 0, '')
            <matplotlib.axis.XTick object at 0x7fdb7b0a8460>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb7b0a8430>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb7b0d0790>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb7b0db8b0>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb68150040>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb68150790>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb68156040>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
            <matplotlib.axis.XTick object at 0x7fdb68150f40>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(0, 1, '')
        YAxis(75.0,65.99999999999999)
            Text(0, 0.5, 'The y-axis')
            Text(0, 0.5, '')
            <matplotlib.axis.YTick object at 0x7fdb7b0a8f10>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb7b0a8ca0>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb7b0db9a0>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb681568b0>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb6815c040>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb6815c790>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb68165040>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
            <matplotlib.axis.YTick object at 0x7fdb68165670>
                Line2D()
                Line2D()
                Line2D()
                Text(0, 0, '')
                Text(1, 0, '')
        Text(0.5, 1.0, 'Example figure')
        Text(0.0, 1.0, '')
        Text(1.0, 1.0, '')
        Legend
            <matplotlib.offsetbox.VPacker object at 0x7fdb7b0d0d00>
                <matplotlib.offsetbox.TextArea object at 0x7fdb7b0d0ee0>
                    Text(0, 0, '')
                <matplotlib.offsetbox.HPacker object at 0x7fdb7b0d0d90>
                    <matplotlib.offsetbox.VPacker object at 0x7fdb7b0d0d30>
                        <matplotlib.offsetbox.HPacker object at 0x7fdb7b0d0d60>
                            <matplotlib.offsetbox.DrawingArea object at 0x7fdb7b0d0850>
                                Line2D(A diaganol line)
                            <matplotlib.offsetbox.TextArea object at 0x7fdb7b0d0820>
                                Text(0, 0, 'A diaganol line')
            FancyBboxPatch((0, 0), width=1, height=1)
        Rectangle(xy=(0, 0), width=1, height=1, angle=0)
_images/07_other_libraries_5_1.png

Now, to demonstrate the power of matplotlib, let’s traverse this hierarchy in true object-oriented fashion and make some changes to a single element.

[4]:
fig.axes[0].get_xticklabels()[4].set(
    color='r',
    style='italic',
    weight='bold',
    size=42,
    family='Comic Sans MS'
)

fig
[4]:
_images/07_other_libraries_7_0.png

This may seem like a silly exercise, but it reveals much about matplotlib. What else about the plot can you change?

Animations with matplotlib

With matplotlib, it is also possible to make animated plots. Here’s one that shows the number of cycling accidents over time. Note you may need to install some additional libraries for this to work in a Jupyter notebook.

[5]:
df
[5]:
PassengerID Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

[6]:
from matplotlib.animation import FuncAnimation

df = (
    pd.read_csv('../data/gb_cycling_accidents.csv')
    .assign(index=lambda df_: pd.DatetimeIndex(df_.Date + ' ' + df_.Time))
    .set_index('index')
    .assign(Year=lambda df_: df_.index.year)
    .groupby(['Year', 'Gender'])['Accident_Index']
    .count()
    .unstack()
)

fig, ax = plt.subplots(figsize=(8, 4))

ln_male, = ax.plot([], [], 'ro-')
ln_female, = ax.plot([], [], 'bo-')
ln_other, = ax.plot([], [], 'go-')

def init():
    ax.set_ylim(-1000, df.Male.max()*1.05)
    ax.set_xlim((df.index.min(), df.index.max()))
    ax.set_xlabel('Year')
    ax.set_ylabel('Number of accidents')
    ax.set_title('Cycling accidents in Great Britain (1979-2018)')
    ax.legend([ln_male, ln_female, ln_other], ['Males', 'Females', 'Other'])
    return ln_male, ln_female, ln_other,

def update(frame):
    data = df.iloc[0:frame]
    ln_male.set_data(data.index, data.Male)
    ln_female.set_data(data.index, data.Female)
    ln_other.set_data(data.index, data.Other)


    return ln_male, ln_female, ln_other,

ani = FuncAnimation(fig, update, frames=len(df.index.to_numpy()),
                    init_func=init, blit=True)
plt.close()
ani.save('../images/gb_cycling_animation.gif')

3a04453d34644eef9b24adc2032bb1e6

Geographical plots with cartopy

Map projections:

There are various libraries for plotting geospatial data in Python. A good example is the `cartopy <https://scitools.org.uk/cartopy/docs/latest/>`__ library. Here, I use cartopy to plot the night-time shading for the current time on a flat map of the earth, along with the location of the University of York, and the 10 most populated cities.

The city data are freely available at the following web page:

[7]:
import pandas as pd

# Load the city data
df = (
    pd.read_csv('../data/worldcities.csv')
    .sort_values('population', ascending=False)
    .head(15)
)
df
[7]:
city city_ascii lat lng country iso2 iso3 admin_name capital population id
0 Tokyo Tokyo 35.6839 139.7744 Japan JP JPN Tōkyō primary 39105000.0 1392685764
1 Jakarta Jakarta -6.2146 106.8451 Indonesia ID IDN Jakarta primary 35362000.0 1360771077
2 Delhi Delhi 28.6667 77.2167 India IN IND Delhi admin 31870000.0 1356872604
3 Manila Manila 14.6000 120.9833 Philippines PH PHL Manila primary 23971000.0 1608618140
4 São Paulo Sao Paulo -23.5504 -46.6339 Brazil BR BRA São Paulo admin 22495000.0 1076532519
5 Seoul Seoul 37.5600 126.9900 South Korea KR KOR Seoul primary 22394000.0 1410836482
6 Mumbai Mumbai 19.0758 72.8775 India IN IND Mahārāshtra admin 22186000.0 1356226629
7 Shanghai Shanghai 31.1667 121.4667 China CN CHN Shanghai admin 22118000.0 1156073548
8 Mexico City Mexico City 19.4333 -99.1333 Mexico MX MEX Ciudad de México primary 21505000.0 1484247881
9 Guangzhou Guangzhou 23.1288 113.2590 China CN CHN Guangdong admin 21489000.0 1156237133
10 Cairo Cairo 30.0444 31.2358 Egypt EG EGY Al Qāhirah primary 19787000.0 1818253931
11 Beijing Beijing 39.9040 116.4075 China CN CHN Beijing primary 19437000.0 1156228865
12 New York New York 40.6943 -73.9249 United States US USA New York NaN 18713220.0 1840034016
13 Kolkāta Kolkata 22.5727 88.3639 India IN IND West Bengal admin 18698000.0 1356060520
14 Moscow Moscow 55.7558 37.6178 Russia RU RUS Moskva primary 17693000.0 1643318494
[8]:
import datetime
import matplotlib.pyplot as plt
import numpy as np
import cartopy.crs as ccrs
from cartopy.feature.nightshade import Nightshade
%matplotlib widget

# Create a figure with a GeoAxes by specifying
fig = plt.figure(figsize=(12, 6))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())

# Get current date and time
dt = datetime.datetime.now()

# Location of University of York
location = (-1.0311947681813436, 53.94930227196749)

# Arrow props
arrowprops=dict(
    arrowstyle='fancy',
    shrinkA=5,
    shrinkB=5,
    fc="k", ec="k",
    connectionstyle="arc3,rad=-0.05",
)

# Add title
ax.set_title(f'Night time shading for {dt}')

# Draw a standard flat map of the world
ax.stock_img()

# Add the nightshade feature
ax.add_feature(Nightshade(dt, alpha=0.4))

# Add University of York location and annotate
ax.scatter(*location, c='r', s=5)
ax.annotate(
    text='University of York',
    xy=location,
    xytext=(-65, 20),
    arrowprops=arrowprops,
    fontweight='bold'
)

# Plot the city locations
ax.scatter(df.lng, df.lat, c='k', s=5)

#Annotate with the names of the cities
for idx, row in df.iterrows():
    ax.annotate(
        text=row.city,
        xy=(row.lng+1, row.lat+1),
        fontsize=8

    )

plt.tight_layout()
plt.show()

matplotlib has its own Basemap Toolkit which predates cartopy. Soon I’ll be off to Copenhagen, so I decided to use it to plot the great circle route between airports.

[9]:
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt

# create new figure, axes instances.
fig=plt.figure(figsize=(12, 4))
ax=fig.add_axes([0.1,0.1,0.8,0.8])

# setup mercator map projection.
m = Basemap(
    llcrnrlon=-15.,llcrnrlat=45.,urcrnrlon=25.,urcrnrlat=65.,
    rsphere=(6378137.00,6356752.3142),
    resolution='l',projection='merc',
    lat_0=40.,lon_0=-20.,lat_ts=20.
)

# lat/lon for manchester and copenhagen
cop_lat, cop_lon = 55.62798787190983, 12.643942953245418
man_lat, man_lon = 53.35544507391249, -2.277185420260674

# draw great circle route between manchster and copenhagen
m.drawgreatcircle(cop_lon,cop_lat,man_lon,man_lat,linewidth=2,color='b')
m.drawcoastlines()
m.fillcontinents()

# draw parallels
m.drawparallels(np.arange(10,90,20), labels=[1,1,0,1])

# draw meridians
m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])

ax.set_title('Great Circle from Manchester to Copenhagen')
plt.show()