Extracting Census Data from a Web Table and Saving as CSV

This guide demonstrates how to retrieve a demographic table from a Census report webpage and save it into a CSV file using Python. We will utilize the requests and pandas libraries for this task.

Prerequisites

Make sure you have the following Python packages installed:

pip install requests pandas lxml html5lib beautifulsoup4

Step-by-Step Guide

  1. Import Required Libraries First, we need to import the necessary libraries:

    import requests
    import pandas as pd
  2. Define the URL Specify the URL of the Census report you want to extract data from:

    url = 'http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500'
  3. Fetch the HTML Content Use the requests library to get the content of the webpage:

    response = requests.get(url)
    html_content = response.content
  4. Extract the Table Utilize pandas to read the HTML content and extract the tables. The relevant table is usually the last one on the page:

    tables = pd.read_html(html_content)
    demographic_table = tables[-1]  # Adjust index if necessary
  5. Save to CSV Finally, save the extracted table to a CSV file:

    demographic_table.to_csv('census_data.csv', index=False)

Conclusion

This simple script allows you to automate the extraction of demographic data from a Census report and save it in a convenient CSV format for further analysis. Adjust the URL and table index as needed for different reports.