End of course assignment

It’s time for you to put your newly developed Python skills to use! The assignment is simple: find some data and tell a story. At the very least, this will require you to:

You can complete the task in this notebook using either Jupyter or Google Colab, or by writing a script in Spyder. Some potential sources of interesting data are listed below, but there are many other sources of quality data on the internet and you are free to choose whatever you want. In the final session, you will be able to share your screen via zoom, or you can post an image or pdf document in the group chat.

Potential sources of data

kaggle

This excellent little website is as much a thriving community of data scientists as it is a source of quality data. After making a free account, you can browse over 178,000 data sets or search using keywords. The kaggle community even post code, analyses and discussion points, which you may look to for inspiration.

Here are some interesting data sets that caught my eye on kaggle:

  • Blood Type Distribution by Country

    • What are the most and least common blood types, and which countries appear to have the most and least of this blood type?

  • Indian Food 101

    • How many unique ingredients are there across all dishes in the data set?

    • What is the most common ingredient?

    • From which state do most dishes originate?

    • Including both preparation and cooking time, what are the fastest and slowest dishes to make?

  • World Happiness Report

    • In the period 2015 - 2019, which countries were the happiest? For which countries did happiness increase / decrease most substantially in this period?

hockeyfights.com

I’m not a hockey fan, let alone a bloodthirsty one. This website makes the list purely because it is a well-curated source of interesting data. I used it some years ago in a similar assignment to explore the relationship between number of fights and games won / lost in a season for the Detroit Redwings and Toronto Maple Leafs.

  • Which are the top three most bellicose hockey teams?

  • Which teams have the greatest fighting rivalry?

  • Is there a relationship between number of fights in a season and the number of games won / lost?

Our World in Data

This brilliant website publishes data-driven reports on poverty, disease, hunger, climate change, war and many other big problems that exist in the World. Most reports include a link to download the raw data, usually in CSV format. Here are a few examples:

  • Global CO2 Emissions

    • Since records began, which 10 countries are responsible for the most CO2 emissions resulting from fossil fuel burning? Which 10 are responsible for the least emissions?

    • As above, but for emissions resulting from land use changes

  • Wild Mammals are making a comeback

    • Download the data and recreate the chart on the web page

  • Air Polution

    • Over the last 10 years in the United Kingdom, which pollutants have increased, and which have decreased?