Data Structures - Spatial Dataframes

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Introduction

Previously, we looked at how the Numpy and Pandas packages add new data science structures to our Python coding environment. Here we look at another new data structure - the Spatial dataframe. As you might guess, it’s quite similar to a Pandas dataframe except that the spatial dataframe has an additional column type: the geometry column. And as budding GIS experts, we know that adding a geometric field opens up the door to many new analyses, i.e. spatial analyses! And so here we circle back to GIS in our exploration of data science!

Geopandas, Shapely, and Fiona (and GDAL)

A few notes before we dive in. First on popular packages and nomenclature. A package called GeoPandas is the open source standard for working with spatial dataframes. Geopandas enables your Pandas dataframes to have a column of geometries (points, lines, polygons, as well as multi-points, polylines, and multi-polygons) in addition to the standard numeric and text based column types standard to Pandas. These columns of geometries are called GeoSeries, and dataframes that have these GeoPandas GeoSeries are called GeoDataframes.

When installed in your Python environment, GeoPandas also installs a number of other packages on which it is dependent (a.k.a. dependencies). The ones to note include Shapely and Fiona. The Shapely package allows us to work with geometric features, much like those we worked with using ArcPy, but with Shapely, we are free from needing any ESRI license! (Instead Shapely uses the open source GDAL engine). In addition to adding geometric objects to our coding environment, Shapely also provides functions for doing analyses with these objects! The Fiona package allows us to read and write various GIS file types in Python, enabling us to work with existing files and facilitates going back and forth between Python and traditional GIS software like ArcGIS Pro.

ArcGIS API for Python

We’ll deal with Geopandas, Shapely and Fiona in the Jupyter Notebooks we are about to access. However, ESRI is also embracing the open source world with its ArcGIS API for Python. It has many similarities to GeoPandas and also does not require any ESRI license for its basic functionality. In a sense, the ArcGIS Python API serves as a bridge between ArcGIS Online and Python, but it also includes a number of useful widgets for mapping and analyzing data.

There is much more to the ArcGIS Python API than we’ll discuss here. Instead, we’ll focus on the API’s “Spatially Enabled Dataframe”, i.e., its version of GeoPandas’ GeoDataframe. You’ll see that the two have many similarities and concepts learned from one fairly easily translate to the other.

Structure of our lessons

As before, we’ll learn by doing. I’ve constructed a number of Jupyter Notebooks that explore the basic concepts of both GeoPandas geodataframe and ArcGIS’s spatially enabled dataframes, which I interchangeably call spatial dataframes. Below is a listing of the key learning objectives associated with each notebook.

Prior to running these notebooks, however, we’ll have create a new Conda environment and install the necessary packages. Instructions for that follow.

Resources


Notebooks & Learning Objectives

Notebook Learning Objectives
1a. Creating spatial dataframes using Geopandas • Contrast a spatial dataframe against a typical dataframe
• List the two Python libraries used for creating and working with spatial dataframes
• Create a GeoPandas GeoDataFrame from a CSV file containing spatial coordinates
 - Construct a GeoSeries object from coordinate columns in a Pandas dataframe
 - Lookup the ESPG code or WKID for a given coordinate reference system
 - Construct a GeoDataFrame from a GeoSeries object and an EPSG code/WKID
• Explore the properties of GeoDataFrames
Transform (reproject) a GeoDataFrames
• Make simple plots of GeoDataFrames
• Create a GeoDataFrame from an existing ShapeFile
• Understand the role the Fiona package plays in creating GeoDataFrames from various file formats
• Create a GeoDataFrame from a GeoJSON file
• Create a GeoDataFrame from an existing KML
1b. Creating spatial dataframes using the ArcGIS API for Python ArcGIS Python API & Spatially Enabled Dataframes
• Convert CSV files with coordinate fields into a Spatial Dataframe
• Explore properties of spatial dataframes
• Create a Spatial Dataframe from an existing feature class
• Create a Spatial Dataframe from a feature layer service
• Reproject a Spatial Dataframe
• Make simple maps of spatial dataframe features
2. Spatial analysis w/ GeoDataFrames • Execute the “data science workflow” with a GeoPandas
- Read data into a geodataframe (CSV and GeoJSON)
- Explore the data: columns/column types, summaries, plots
- Analyze the data…
- Visualize results
Subset features in a geodataframe by attribute
Merge geodataframes
Dissolve geodataframe features based on an attribute value
Join attributes to a geodataframe
Spatially join data from one geodataframe to another
• Generate various plots from single and multiple geodataframes
Saving a geodataframe to a feature class

Section Prep

1. Create a new Conda environment (“gis”) and install necessary packages

This section involves several new Python packages that we’ll have to install. As we can’t be sure these will conflict with existing packages, we’ll create a new Conda environment and install our packages in that environment. We’ll call this environment gis to maintain consistency with my documentation and files.

Recall that to create a new environment, we first need to open your Python Command Prompt. From there, run the following commands. (You can skip the comment lines, there to explain what each command does…)

:exclamation: Note that the code below is slightly different than what is in the recordings.

# Create a fresh environment
conda create --clone arcgispro-py3 --name gis

# Activate the environment
activate gis

# Install geopandas using pip
pip install geopandas

# Install additional packages
conda install fiona
conda install -c conda-forge contextily -y

2. Fork and clone the SpatialDataframes repository

The notebooks for this section are included in the repository https://github.com/env859/SpatialDataFrames.

  • Fork the repository to your own GitHub account
  • Clone the forked repository (using Git Bash or GitHub desktop) to your local machine.

3. Create a new Jupyter Notebooks shortcut and run it [optional].

The repository has a shortcut to run Jupyter Notebooks using your new environment, but it’s good practice to create one yourself.

  • Navigate into your newly cloned repository.

  • Create a new text file and rename it “RunJupyter.bat”.

  • Open the text file in a text editor and add the lines

    @set the_env=gis
    call "C:\Program Files\ArcGIS\Pro\bin\Python\Scripts\activate.bat" "%the_env%"
    call "%localappdata%\ESRI\conda\envs\%the_env%\scripts\jupyter-notebook.exe" %cd% 
    call "C:\Program Files\ArcGIS\Pro\bin\Python\Scripts\deactivate.bat"
    
    • Line 1 creates a system variable named “the_env”, setting its value to “gis” (the name of the Conda environment created earlier).
    • Line 2 activates this Conda environment. (Note the use of the variable set in line 1…)
    • Line 3 executes the command to start Jupyter notebook stored in the Conda environments installation folder.
    • Line 4 deactivates the Conda environment. This line will run once Jupyter is closed.
  • After this you should be good to go!

4. Open Jupyter and commence coding

The recording links below walk you through the code in the notebooks in your cloned workspace. The lessons are in the “lessons” folder, and completed versions of the lessons are in the “complete” folder.

Note: The environment created above has both the geopandas and arcgis package installed, so you don't need to switch environments; this is an update since the recordings


# Topic Learning Objectives Recording Time
5.4.1.2 Spatial Dataframes: CSV to GeoDataframe • Python packages for spatial dataframes
• Creating geometries from coordinate columns
• Creating geodataframes from CSV files
• Coordinate reference systems (CRS) and EPSG/WKID codes
16:21
5.4.1.3 Spatial Dataframes: Exploring GeoDataframes • Geodataframes as Pandas dataframes
• Revealing the CRS of a geodataframe
• Transforming geodataframes to a new CRS
• Plotting geodataframes
6:53
5.4.1.4 Spatial Dataframes: Other formats to GeoDataframe • Reading shapefiles into geodataframes
• Reading GeoJSON into geodataframes
• Fiona drivers for other formats
• Reading KML files into geodataframes
• A word on ESRI geodatabase files
• A word on ESRI’s Open Data Hubs
15:47

Again, no need to switch Conda environments; the `gis` environment created above works fine

# Topic Learning Objectives Recording Time
5.4.1.4 Spatial Dataframes: CSV to Spatially Enabled Dataframes • Difference between “geodataframes” (Geopandas)
  & “spatially enabled dataframes” (ArcGIS API for Python)
• Creating SEDFs from CSV files
• Creating SEDFs from Pandas dataframes
• Exploring SEDFs
11:37
5.4.1.5 Spatial Dataframes: Other formats to SEDFs • SEDFs from shapefiles
• SEDFs from geopandas dataframes (not GeoJSON or KML)
• Reading on-line datasets
11:33

The recording includes the append() command for combining spatial dataframes. That has been deprecated, and you now use the concat() command, as is included in the notebooks.

# Topic   Recording Time
5.4.2.1 Spatial Analysis: Loading and Exploring EV Data   8:35
5.4.2.2 Spatial Analysis: Loading and Exploring Tract Data   8:20
5.4.2.3 Spatial Analysis: Selecting by Attribute Values   9:39
5.4.2.4 Spatial Analysis: Combining Geodataframes   6:46
5.4.2.5 Spatial Analysis: Dissolving Features   9:04
5.4.2.6 Spatial Analysis: Attribute Joins   15:12
5.4.2.7 Spatial Analysis: Computing Geometric Attributes   8:13
5.4.2.8 Spatial Analysis: Spatial Subsets - Intersections   10:57
5.4.2.9 Spatial Analysis: Spatial Joins   6:07
5.4.2.10 Spatial Analysis: Sharing Your Work   2:55