Internet Data#
Note
Source: Contributed by PhD students in COMP 501 at Loyola University Chicago.
Every day you interact with online services powered by data: weather apps, maps, movie databases, sports scores, and more. Much of this data is accessible to programmers through APIs (Application Programming Interfaces). An API provides a structured way to request information from a web service without needing to scrape raw HTML.
What Is an API?#
An API defines a predictable way for applications to communicate with each other. Many websites expose public APIs that let you request data directly. Examples include:
Weather services
News headlines
Public transportation schedules
Currency exchange rates
Sports statistics
GitHub repository information
Most APIs communicate over HTTP and return data in JSON format.
HTTP Basics#
The most common HTTP request types are:
GET — retrieve data (this is what we use in this chapter)
POST — send new data
PUT/PATCH — update existing data
DELETE — remove data
A URL can include query parameters to filter or customize the response:
https://api.example.com/weather?city=Chicago&units=metric
Everything after ? is a query string: city=Chicago and units=metric are
two parameters separated by &.
The requests Module#
Install it if needed:
pip install requests
Basic usage:
import requests
url = "https://api.github.com"
response = requests.get(url)
print(response.status_code)
print(response.text)
Common status codes:
200— OK (success)404— Not Found401— Unauthorized500— Server Error
Always check the status code before using the response data.
Parsing JSON#
Most APIs return JSON, which Python’s requests library can convert directly to
a dictionary:
import requests
url = "https://api.github.com/repos/python/cpython"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
print(data["full_name"])
print(data["stargazers_count"])
response.json() parses the JSON body and returns the equivalent Python object —
usually a dictionary or a list of dictionaries.
Using Query Parameters#
Instead of building query strings manually, pass parameters as a dictionary:
import requests
url = "https://api.example.com/search"
params = {"q": "python", "limit": 5}
response = requests.get(url, params=params)
print(response.url) # shows the full URL with parameters appended
print(response.json())
Handling Errors#
Defensive code checks the status code and handles network failures:
import requests
url = "https://api.example.com/data"
try:
response = requests.get(url, timeout=5)
if response.status_code != 200:
print("Error:", response.status_code)
else:
print(response.json())
except requests.exceptions.ConnectionError:
print("Network error: could not connect.")
except requests.exceptions.Timeout:
print("Request timed out.")
Saving API Results to a File#
After retrieving data you can persist it locally for later analysis:
import requests
import json
url = "https://api.github.com/users/octocat"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
with open("octocat.json", "w") as f:
json.dump(data, f, indent=2)
json.dump writes a Python object to a file as JSON. indent=2 makes the
output human-readable.
Putting It Together#
Here is a complete program that looks up a GitHub user:
import requests
username = input("Enter a GitHub username: ")
url = f"https://api.github.com/users/{username}"
try:
response = requests.get(url, timeout=5)
if response.status_code == 200:
data = response.json()
print(f"Name: {data.get('name', 'N/A')}")
print(f"Followers: {data['followers']}")
print(f"Public repos: {data['public_repos']}")
elif response.status_code == 404:
print(f"User '{username}' not found.")
else:
print(f"Unexpected error: {response.status_code}")
except requests.exceptions.ConnectionError:
print("Could not connect to GitHub.")
Exercises#
Request data from
https://api.github.com/users/octocatand print thelogin,public_repos, andfollowersfields.Use the Open-Meteo API to fetch the first five hourly temperature values for Chicago (latitude 41.88, longitude -87.62):
https://api.open-meteo.com/v1/forecast?latitude=41.88&longitude=-87.62&hourly=temperature_2m
Request country information from:
https://restcountries.com/v3.1/name/canada
Print the population, region, and capital.
Write a program that accepts a city name from the user, fetches its coordinates from:
https://geocoding-api.open-meteo.com/v1/search?name=<city>
and saves the JSON response to
city_data.json. Handle the case where the city is not found and the case where there is no internet connection.Some APIs paginate large datasets. Write a loop that fetches up to five pages of results from an API of your choice, merges them into a single list, and prints the total number of items retrieved.