Extracting Census Data from a Web Table and Saving as CSV
This guide demonstrates how to retrieve a demographic table from a Census report webpage and save it into a CSV file using Python. We will utilize the requests and pandas libraries for this task.
Prerequisites
Make sure you have the following Python packages installed:
pip install requests pandas lxml html5lib beautifulsoup4
Step-by-Step Guide
Import Required Libraries First, we need to import the necessary libraries:
import requests import pandas as pdDefine the URL Specify the URL of the Census report you want to extract data from:
url = 'http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500'Fetch the HTML Content Use the
requestslibrary to get the content of the webpage:response = requests.get(url) html_content = response.contentExtract the Table Utilize
pandasto read the HTML content and extract the tables. The relevant table is usually the last one on the page:tables = pd.read_html(html_content) demographic_table = tables[-1] # Adjust index if necessarySave to CSV Finally, save the extracted table to a CSV file:
demographic_table.to_csv('census_data.csv', index=False)
Conclusion
This simple script allows you to automate the extraction of demographic data from a Census report and save it in a convenient CSV format for further analysis. Adjust the URL and table index as needed for different reports.