Week 6 - Exercises

Exercise 1 - GB cycling accidents

In the data folder of the course materials you should find a CSV file called gb_cycling_accidents.csv which contains data on bicycle accidents in Great Britain from 1970 to 2018. I retrieved the data set from kaggle, which cites data.world as the original source. Each row holds information about a specific accident, and each column holds information about the accident, such as the date, time of day, day of week, number of vehicles involved, weather conditions, severity, etc. Here is the full explanation of the columns in the data set.

Variable

Definition

Accident_Index

Unique identifier for the accident. This may be thought of as the accident “case number”.

Number_of_Vehicles

Number of vehicles that were involved in the accident

Number_of_Casualties

Number of casualties resulting from the accident

Date

Date when the accident happened

Time

Time when the accident happened

Speed_limit

Speed limit on the part of the road where the accident took place

Road_conditions

Road condition (e.g., “frost”) at the time and place of the accident

Weather_conditions

Whether condition (e.g., “rain”) at time and place of the accident

Day

Day of the week when the accident occurred

Road_type

Type of road (e.g., “Dual carriageway”) where the accident happened

Light_conditions

Light conditions (e.g., “Daylight”) at time of accident

Gender

Whether the accident victim was Male or Female

Severity

How severe (e.g., “Serious”) the accident was

Age_Grp

Age group of the accident victim

Let’s explore the frequency of accidents with respect to the different variables.

1. Import pandas and read data/gb_cycling_accidents.csv into a DataFrame

[ ]:

2. How many unique values are in the following columns?

  • Speed_limit

  • Road_conditions

  • Weather_conditions

  • Road_type

  • Light_conditions

  • Gender

  • Severity

  • Age_Grp

[ ]:

3. What road conditions were associated with the most and least accidents?

[ ]:

4. What weather conditions were associated with the most and least accidents?

[ ]:

5. What road type was associated with the most and least accidents?

[ ]:

6. What light conditions were associated with the most and least accidents?

[ ]:

7. What speed limit was associated with the most and least accidents?

[ ]:

8. Based on the above, write a single sentence that summarises the conditions in which most accidents appeared to occur.

[ ]:

9. Create a bar chart showing how accidents were distributed by Age_Grp

[ ]:

10. Across all accidents, what percentage involved Males, what percentage involved Females, and what percentage involved people identifying as ‘Other’? Show the results in a pie chart.

[ ]:

11. What was the highest number of vehicles involved in a single accident?

[ ]:

12. What was the highest number of casualties involved in a single accident?

[ ]:

13. On which day of the week did the accident with Accident_Index 201443N027074 occur?

[ ]:

14. Create a separate DataFrame for all serious accidents that happened on a Sunday in wet road conditions. How many were there?

[ ]:

15. Create and assign a new `DatetimeIndex <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html>`__ for the DataFrame using the Date and Time columns

[ ]:

16. Add a new column to the DataFrame called long_date. It should contain the correct dates matching the following format.

  • Wednesday 09 February 2012

[ ]:

17. What is the worst day on record in terms of the number of accidents that were reported?

[ ]:

18. Make a bar chart showing total accidents by month of the year

[ ]:

19. Make a line graph showing the total number of accidents that occurred each year from 1979-2018. Have accidents declined overall? In which years did the most and least cycling accidents occur?

[ ]:

20. Repeat the above, but this time with separate lines for ``Gender``

[ ]:

21. Repeat the above, but this time with separate lines for ``Age_Grp``

[ ]:

22. Repeat the above, but this time with separate subplots for ``Severity``

[ ]:

23. Make a bar chart showing the total number of accidents for each hour in the day from 1979-2018

[ ]:

24. As above, but with stacked bars using different colours for each day of the week

[ ]:

25. Make a bar chart showing the year-on-year percentage change for accidents with different coloured bars for each Severity

[ ]: