Four Exercises to Practice Our Web-Scraping and API-Calls


Tuesday 15-06-2021

The exercise for this hour long session is as follows! Below are four very quick webscraping\API examples, with basic and advanced parts. The exercises include things we've asked in recent interviews (e.g. the Simpsons), examples from the lecture (e.g. Twitter), to an obsession with cats, and things we're using for ongoing research (e.g. the peerage). You should self-select into one of the four breakout rooms depending on how experienced you are either with webscraping\APIs or programming more generally. These are meant to be mostly fun and as informal exercises as possible; don't worry if you can't finish them, its mostly just to get you practicing!

We'll reconvene in the main room at about 12:45 BST, at which point we'll ask each of the four rooms to present their work for a couple of minutes. If there's time, I'll also post some of my solutions (in Python; sorry!), and if there's not time, I'll post an .ipynb online and into Slack. Lets go!

1a. Basic: Scrape and print out a quote from the Simpsons API.
1b. Advanced: How many quotes are on the API? What's your favourite quote?

2.a Basic: Scrape a picture of a cat from thecatapi and programatically show it in an ipynb or rmd file.
2.b. Advanced: Can you download a hundred of these pictures and turn them into a .gif?

3.a Basic: Scrape the last 5 years of tweets from @cbarrie if you have an academic research key\approval. How many times has he mentioned SICCS?
3.b Advanced: What hashtags does @cbarrie spam? (n.b. Chris gives full consent for this analysis!)

4.a Basic: Can you scrape thepeerage and parse the biography of Prince Charles?
4.b Advanced: What percent of people on the first ten pages are males, and how many are females? (n.b. the very generous Darryl has given permission to scrape his site!)