Fetching Data into Python
To analyze data with Python, we need to get access to the data and bring them into our Python scripting environment. We’ve already seen how we can read text files using Python’s built-in open
function to create a file object and read GIS tables using ArcPy’s cursor objects, but Python has several other, more effective means for accessing external data. In this session, we examine a number of helpful Python packages and how they are used to access, fetch, unpack, and manage data in various formats and from various sources.
Lab Prep
- Fork and clone the repository found here https://github.com/ENV859/GettingData
- To run these exercises you will need you “gis” environment created in the Spatial DataFrames section. A shortcut to open this environment is provided in the above repository.
Session Notebooks & Learning Objectives
The specific exercise notebooks are fairly self-explanatory and review an array of methods used to access and download data from the internet. They also touch on a few concepts that we will dig deeper into in upcoming sessions.
Specific learning objectives include:
Notebook | Learning Objectives |
---|---|
0-Importing-Local-Files review… |
• Review how to load CSV file data into Python - Pure Python - CSV module - Numpy - Pandas |
1a-Getting-data-with-Pandas 1b-DEMO-Bulk-Download-with-Pandas |
• Grabbing static on-line files with Pandas’ read_csv() function• Bulk downloading data with Pandas |
2a-Fetching-Data-with-urllib 2b-Extract-Statewide-HUCs-with-urllib |
• Use the urllib library to send web requests and handle responses• Use the zipfile package to uncompress zipped files• Form URLs interactively and handle them with urllib |
The recordings below are optional… | |
3-Fetching-files-with-ftplib [deprecated] |
• Use theftp package to fetch data from FTP servers- Create a link to the FTP server - Log into the server (anonymously) - Navigate the server’s file structure - Create a list of files to fetch - Iterate through each file, fetch & unzip it |
4-Grabbing-HTML-tables-with-Pandas | • Fetch data from formatted HTML pages w/Pandas read_html() |
5a-Scraping-Data-With-BeautifulSoup | • Use the requests library to build URLs programmatically• Send URL requests and handle responses using requests.get() • Parse raw HTML into searchable components w/ BeautifulSoup |
6-Using-specialized-packages-to-grab-data | • Use the census package to download US Census data• Explain the use of “keys” in download packages |
→ Click on the link to fire up Jupyter in your cloned workspace and let’s go!
Video Highlights
6.3.2 Importing data from static text files
Time | Topic |
---|---|
0:00 | The data we will be importing |
0:44 | Reading data with base Python’s open() function |
1:34 | Using the csv() package to read CSV files |
2:47 | Reading CSV files using NumPy’s genfromtxt() function |
4:09 | Reading CSV files using Panda’s read_csv() function |
6:23 | Retrieving tab-delimited data from websites into a Pandas Dataframe and skipping commented lines with comment='#' . |
10:44 | - Dropping rows (and columns) from dataframes with drop() |
11:09 | - Using the inplace=True modifier in Pandas |
11:38 | - Skipping lines when reading in text files with skiprows() . |
13:45 | Saving data to local files with to_csv() |
16:20 | Bulk downloading files with Pandas and Python |
16:40 | -Installing packages with pip inside a notebook |
18:00 | -Introducing the us package |
18:15 | -Iterating the read_csv() with a dynamic URL to pull multiple datasets |
6.3.3 Getting data with Urrlib
and ZipFile
Time | Topic |
---|---|
0:50 | Introducing the urllib and zipfile packages |
1:12 | The data we’ll be fetching: Census data |
2:20 | Using urllib.request.urlretrieve() to fetch and save web files |
4:02 | -Running local commands in Jupyter with the ! character |
5:18 | Unzipping a local zip file with the zipfile |
7:40 | Remainder of video is outdated (the remote server has been decommissioned) |