Week 5 - Exercises¶
Exercise 1 - RMS Titanic¶
On April 15, 1912, during her maiden voyage from Southampton to New York, the four-funnelled ocean liner RMS Titanic hit an iceberg and sank to the bottom of the ocean in what is now one of the most infamous maritime disasters in history. For aesthetic reasons, and because the ship was widely considered unsinkable, many lifeboats were removed before the launch. This unfortunate decision contributed to the death of 1502 out of 2224 passengers and crew.
The Encyclopedia Titanica maintain a database of the names of passengers on the Titanic, which includes other information such as age, sex, and whether the passenger survived. A subset of these data are available freely through Kaggle, an online data repository that also hosts regular data science competitions. Extensive analyses of the Titanic data set have shown that, while luck played its part, some groups of passengers did appear to have a better chance of surviving than others.
The Titanic passenger dataset from Kaggle can be found in the data folder of the course materials, with the name titanic.csv. Let’s use pandas to interrogate it!
Here is the key to the data:
Variable |
Definition |
Key |
|---|---|---|
PassengerID |
Unique numeric identifier for the passenger |
NA |
Survival |
Survival |
0 = No, 1 = Yes |
Pclass |
Ticket class |
1 = 1st, 2 = 2nd, 3 = 3rd |
Sex |
Sex |
NA |
Age |
Age in years |
NA |
SibSp |
Number of siblings / spouses aboard the Titanic |
NA |
Parch |
Number of parents / children aboard the Titanic |
NA |
Ticket |
Ticket number |
NA |
Fare |
Passenger fare |
NA |
Cabin |
Cabin number |
NA |
Embarked |
Port of Embarkation |
C = Cherbourg, Q = Queenstown, S = Southampton |
Notes:
Pclass: a proxy for socio-economic status - 1st = Upper - 2nd = Middle - 3rd = Lower
Age: fractional if less than 1. If the age is estimated, it is in the form of xx.5
SibSp: The dataset defines family relations as follows: - Sibling = brother, sister, stepbrother, stepsister - Spouse = husband, wife (mistresses and fiancés were ignored)
Parch: The dataset defines family relations as follows: - Parent = mother, father - Child = daughter, son, stepdaughter, stepson - Some children travelled only with a nanny, therefore parch=0 for them.
[1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')
1. Import pandas and read data/titanic.csv into a DataFrame¶
[2]:
import pandas as pd
df = pd.read_csv('../data/titanic.csv')
df
[2]:
| PassengerID | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
| 889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
| 890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
2. Display the first 6 rows of the DataFrame¶
[3]:
df.head(6)
[3]:
| PassengerID | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| 5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
3. Display the last 6 rows of the DataFrame¶
[4]:
df.tail(6)
[4]:
| PassengerID | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.125 | NaN | Q |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.000 | NaN | S |
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.000 | B42 | S |
| 888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.450 | NaN | S |
| 889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.000 | C148 | C |
| 890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.750 | NaN | Q |
4. Assign PassengerID as the index¶
[5]:
df = df.set_index('PassengerID')
5. Print out all of the available information for the passenger with 567 as their PassengerID¶
[6]:
df.loc[567]
[6]:
Survived 0
Pclass 3
Name Stoytcheff, Mr. Ilia
Sex male
Age 19.0
SibSp 0
Parch 0
Ticket 349205
Fare 7.8958
Cabin NaN
Embarked S
Name: 567, dtype: object
6. Generate some descriptive statistics for the numeric data¶
[7]:
df.describe()
[7]:
| Survived | Pclass | Age | SibSp | Parch | Fare | |
|---|---|---|---|---|---|---|
| count | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
| mean | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
| std | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
| min | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
| 50% | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
| 75% | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
| max | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
7. How many male and female passengers are there in this dataset? Use pandas to show this information in a simple bar chart.¶
[8]:
(
df.Sex.value_counts(normalize=True)
.plot(kind='bar', xlabel='Sex', ylabel='Total Passengers (%)', rot=0)
); # rot=0 stops the 'male' and 'female' labels being displayed as horizontal text
8. What were the names of the oldest and youngest passengers?¶
[9]:
oldest_youngest = [df.Age.min(), df.Age.max()]
df.loc[df.Age.isin(oldest_youngest)]
[9]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.00 | 0 | 0 | 27042 | 30.0000 | A23 | S |
| 804 | 1 | 3 | Thomas, Master. Assad Alexander | male | 0.42 | 0 | 1 | 2625 | 8.5167 | NaN | C |
9. How many of the 5 oldest female passengers traveled first class?¶
[10]:
# All but one
five_oldest_females = (
df.loc[df.Sex=='female']
.sort_values('Age', ascending=False)
.iloc[0:5]
)
five_oldest_females
[10]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 484 | 1 | 3 | Turkula, Mrs. (Hedwig) | female | 63.0 | 0 | 0 | 4134 | 9.5875 | NaN | S |
| 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 | 1 | 0 | 13502 | 77.9583 | D7 | S |
| 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0000 | B28 | NaN |
| 367 | 1 | 1 | Warren, Mrs. Frank Manley (Anna Sophia Atkinson) | female | 60.0 | 1 | 0 | 110813 | 75.2500 | D37 | C |
| 269 | 1 | 1 | Graham, Mrs. William Thompson (Edith Junkins) | female | 58.0 | 0 | 1 | PC 17582 | 153.4625 | C125 | S |
[11]:
# To get the values... 4 travelled 1st class, 1 travelled third class
five_oldest_females.Pclass.value_counts()
[11]:
1 4
3 1
Name: Pclass, dtype: int64
10. How many passengers embarked at Southampton?¶
[12]:
# 644
df.Embarked.value_counts()
[12]:
S 644
C 168
Q 77
Name: Embarked, dtype: int64
11. Who paid the most for their ticket?¶
[13]:
df.loc[df.Fare==df.Fare.max()]
[13]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
| 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C |
| 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
[14]:
# Just get the names
df.loc[df.Fare==df.Fare.max()].Name
[14]:
PassengerID
259 Ward, Miss. Anna
680 Cardeza, Mr. Thomas Drake Martinez
738 Lesurer, Mr. Gustave J
Name: Name, dtype: object
12. Of the top ten passengers who paid the most for their ticket, how many survived?¶
[15]:
# 7 out of 10
# Get the top 10 who paid the most
df.sort_values('Fare', ascending=False).iloc[0:10] # ascending=False
[15]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
| 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
| 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C |
| 89 | 1 | 1 | Fortune, Miss. Mabel Helen | female | 23.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
| 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
| 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
| 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
| 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
| 743 | 1 | 1 | Ryerson, Miss. Susan Parker "Suzette" | female | 21.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
| 119 | 0 | 1 | Baxter, Mr. Quigg Edmond | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C |
[16]:
# Just get the number who survived by summing the Survived column
df.sort_values('Fare', ascending=False).iloc[0:10].Survived.sum()
[16]:
7
13. Of the top ten passengers who paid the least for their ticket, how many survived?¶
[17]:
# Only 1
df.sort_values('Fare', ascending=True).iloc[0:10] # Now ascending=True
[17]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 272 | 1 | 3 | Tornquist, Mr. William Henry | male | 25.0 | 0 | 0 | LINE | 0.0 | NaN | S |
| 598 | 0 | 3 | Johnson, Mr. Alfred | male | 49.0 | 0 | 0 | LINE | 0.0 | NaN | S |
| 303 | 0 | 3 | Johnson, Mr. William Cahoone Jr | male | 19.0 | 0 | 0 | LINE | 0.0 | NaN | S |
| 634 | 0 | 1 | Parr, Mr. William Henry Marsh | male | NaN | 0 | 0 | 112052 | 0.0 | NaN | S |
| 278 | 0 | 2 | Parkes, Mr. Francis "Frank" | male | NaN | 0 | 0 | 239853 | 0.0 | NaN | S |
| 414 | 0 | 2 | Cunningham, Mr. Alfred Fleming | male | NaN | 0 | 0 | 239853 | 0.0 | NaN | S |
| 675 | 0 | 2 | Watson, Mr. Ennis Hastings | male | NaN | 0 | 0 | 239856 | 0.0 | NaN | S |
| 264 | 0 | 1 | Harrison, Mr. William | male | 40.0 | 0 | 0 | 112059 | 0.0 | B94 | S |
| 467 | 0 | 2 | Campbell, Mr. William | male | NaN | 0 | 0 | 239853 | 0.0 | NaN | S |
| 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN | 0 | 0 | 239855 | 0.0 | NaN | S |
[18]:
df.sort_values('Fare', ascending=True).head(10).Survived.sum()
[18]:
1
14. Of all surviving passengers, how many were male and how many were female?¶
[19]:
df.loc[df.Survived==1, 'Sex'].value_counts()
[19]:
female 233
male 109
Name: Sex, dtype: int64
15. Make a boxplot describing the age distribution for males and females, showing the mean age as well as the median.¶
[20]:
title = 'Age distribution of male and female passengers on the RMS Titanic'
ax = df.boxplot(column='Age', by='Sex', showmeans=True, ylabel='Age');
ax.set_title(title); # Apply sensible title to the axes
ax.get_figure().suptitle(''); # Get read of the default figure title
16. Get the information for all passengers with ‘Frank’ in their name¶
[21]:
df.loc[df.Name.str.contains('Frank')]
[21]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | |||||||||||
| 94 | 0 | 3 | Dean, Mr. Bertram Frank | male | 26.0 | 1 | 2 | C.A. 2315 | 20.575 | NaN | S |
| 166 | 1 | 3 | Goldsmith, Master. Frank John William "Frankie" | male | 9.0 | 0 | 2 | 363291 | 20.525 | NaN | S |
| 278 | 0 | 2 | Parkes, Mr. Francis "Frank" | male | NaN | 0 | 0 | 239853 | 0.000 | NaN | S |
| 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.750 | NaN | Q |
| 329 | 1 | 3 | Goldsmith, Mrs. Frank John (Emily Alice Brown) | female | 31.0 | 1 | 1 | 363291 | 20.525 | NaN | S |
| 367 | 1 | 1 | Warren, Mrs. Frank Manley (Anna Sophia Atkinson) | female | 60.0 | 1 | 0 | 110813 | 75.250 | D37 | C |
| 549 | 0 | 3 | Goldsmith, Mr. Frank John | male | 33.0 | 1 | 1 | 363291 | 20.525 | NaN | S |
17. What was the surname of the largest family on board the Titanic?¶
[22]:
# Create a new column that is the sum of Parch and SibSp (refer back to the data key)
# and then sort the DataFrame by this column to see which surname appears at the top.
# The Sage family looks to have been the largest, although some family members
# appear not be included in the dataset.
df['RelsAboard'] = df.SibSp + df.Parch
df.sort_values('RelsAboard', ascending=False).head(10)
[22]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | RelsAboard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | ||||||||||||
| 181 | 0 | 3 | Sage, Miss. Constance Gladys | female | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 793 | 0 | 3 | Sage, Miss. Stella Anna | female | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 202 | 0 | 3 | Sage, Mr. Frederick | male | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 160 | 0 | 3 | Sage, Master. Thomas Henry | male | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 864 | 0 | 3 | Sage, Miss. Dorothy Edith "Dolly" | female | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 325 | 0 | 3 | Sage, Mr. George John Jr | male | NaN | 8 | 2 | CA. 2343 | 69.55 | NaN | S | 10 |
| 387 | 0 | 3 | Goodwin, Master. Sidney Leonard | male | 1.0 | 5 | 2 | CA 2144 | 46.90 | NaN | S | 7 |
| 60 | 0 | 3 | Goodwin, Master. William Frederick | male | 11.0 | 5 | 2 | CA 2144 | 46.90 | NaN | S | 7 |
| 679 | 0 | 3 | Goodwin, Mrs. Frederick (Augusta Tyler) | female | 43.0 | 1 | 6 | CA 2144 | 46.90 | NaN | S | 7 |
18. What was the name, sex and age of the youngest passenger to embark at Cherbourg?¶
[23]:
# Select and sort the Cherbourg embarkees by age
df.loc[df.Embarked=='C'].sort_values('Age', ascending=True).head()
[23]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | RelsAboard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | ||||||||||||
| 804 | 1 | 3 | Thomas, Master. Assad Alexander | male | 0.42 | 0 | 1 | 2625 | 8.5167 | NaN | C | 1 |
| 470 | 1 | 3 | Baclini, Miss. Helene Barbara | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C | 3 |
| 645 | 1 | 3 | Baclini, Miss. Eugenie | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C | 3 |
| 828 | 1 | 2 | Mallet, Master. Andre | male | 1.00 | 0 | 2 | S.C./PARIS 2079 | 37.0042 | NaN | C | 2 |
| 382 | 1 | 3 | Nakid, Miss. Maria ("Mary") | female | 1.00 | 0 | 2 | 2653 | 15.7417 | NaN | C | 2 |
[24]:
# Just get the relevant information
df.loc[df.Embarked=='C'].sort_values('Age', ascending=True).iloc[0].loc[['Name', 'Sex', 'Age']]
[24]:
Name Thomas, Master. Assad Alexander
Sex male
Age 0.42
Name: 804, dtype: object
19. Who stayed in cabin D56? Can you find out anything interesting about this person on the Encyclopedia Titanica?¶
[25]:
# Beesley, Mr. Lawrence
df.loc[df.Cabin=='D56']
[25]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | RelsAboard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | ||||||||||||
| 22 | 1 | 2 | Beesley, Mr. Lawrence | male | 34.0 | 0 | 0 | 248698 | 13.0 | D56 | S | 0 |
20. There was a Countess aboard the Titanic. What information can you find about her?¶
[26]:
df[df.Name.str.contains('Countess')]
[26]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | RelsAboard | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | ||||||||||||
| 760 | 1 | 1 | Rothes, the Countess. of (Lucy Noel Martha Dye... | female | 33.0 | 0 | 0 | 110152 | 86.5 | B77 | S | 0 |
21. Make two new columns containing the first and last name for each passenger.¶
[27]:
df['FirstName'] = df.Name.str.split(',').str.get(1)
df['LastName'] = df.Name.str.split(',').str.get(0)
df.head()
[27]:
| Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | RelsAboard | FirstName | LastName | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PassengerID | ||||||||||||||
| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | 1 | Mr. Owen Harris | Braund |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | 1 | Mrs. John Bradley (Florence Briggs Thayer) | Cumings |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | 0 | Miss. Laina | Heikkinen |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | 1 | Mrs. Jacques Heath (Lily May Peel) | Futrelle |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | 0 | Mr. William Henry | Allen |
22. Make a bar chart showing the proportion of males and females who survived in each class¶
[28]:
colors = ['tab:red', 'tab:blue']
(
df.groupby(['Pclass', 'Sex'])
.Survived
.mean()
.plot(kind='bar', color=colors, ylabel='Proportion of survivors', rot=45)
);
[28]:
<AxesSubplot:xlabel='Pclass,Sex', ylabel='Proportion of survivors'>