Data Structures - Spatial Dataframes
Introduction
Previously, we looked at how the Numpy and Pandas packages add new data science structures to our Python coding environment. Here we look at another new data structure - the Spatial dataframe. As you might guess, it’s quite similar to a Pandas dataframe except that the spatial dataframe has an additional column type: the geometry column. And as budding GIS experts, we know that adding a geometric field opens up the door to many new analyses, i.e. spatial analyses! And so here we circle back to GIS in our exploration of data science!
Geopandas, Shapely, and Fiona (and GDAL)
A few notes before we dive in. First on popular packages and nomenclature. A package called GeoPandas is the open source standard for working with spatial dataframes. Geopandas enables your Pandas dataframes to have a column of geometries (points, lines, polygons, as well as multi-points, polylines, and multi-polygons) in addition to the standard numeric and text based column types standard to Pandas. These columns of geometries are called GeoSeries, and dataframes that have these GeoPandas GeoSeries are called GeoDataframes.
When installed in your Python environment, GeoPandas also installs a number of other packages on which it is dependent (a.k.a. dependencies). The ones to note include Shapely and Fiona. The Shapely package allows us to work with geometric features, much like those we worked with using ArcPy, but with Shapely, we are free from needing any ESRI license! (Instead Shapely uses the open source GDAL engine). In addition to adding geometric objects to our coding environment, Shapely also provides functions for doing analyses with these objects! The Fiona package allows us to read and write various GIS file types in Python, enabling us to work with existing files and facilitates going back and forth between Python and traditional GIS software like ArcGIS Pro.
ArcGIS API for Python
We’ll deal with Geopandas, Shapely and Fiona in the Jupyter Notebooks we are about to access. However, ESRI is also embracing the open source world with its ArcGIS API for Python. It has many similarities to GeoPandas and also does not require any ESRI license for its basic functionality. In a sense, the ArcGIS Python API serves as a bridge between ArcGIS Online and Python, but it also includes a number of useful widgets for mapping and analyzing data.
There is much more to the ArcGIS Python API than we’ll discuss here. Instead, we’ll focus on the API’s “Spatially Enabled Dataframe”, i.e., its version of GeoPandas’ GeoDataframe. You’ll see that the two have many similarities and concepts learned from one fairly easily translate to the other.
Structure of our lessons
As before, we’ll learn by doing. I’ve constructed a number of Jupyter Notebooks that explore the basic concepts of both GeoPandas geodataframe and ArcGIS’s spatially enabled dataframes, which I interchangeably call spatial dataframes. Below is a listing of the key learning objectives associated with each notebook.
Prior to running these notebooks, however, we’ll have create a new Conda environment and install the necessary packages. Instructions for that follow.
Resources
- http://geopandas.org/index.html
- https://developers.arcgis.com/python/guide/introduction-to-the-spatially-enabled-dataframe/
- https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#spatialdataframe
Notebooks & Learning Objectives
Notebook | Learning Objectives |
---|---|
1a. Creating spatial dataframes using Geopandas | • Contrast a spatial dataframe against a typical dataframe • List the two Python libraries used for creating and working with spatial dataframes • Create a GeoPandas GeoDataFrame from a CSV file containing spatial coordinates - Construct a GeoSeries object from coordinate columns in a Pandas dataframe - Lookup the ESPG code or WKID for a given coordinate reference system - Construct a GeoDataFrame from a GeoSeries object and an EPSG code/WKID • Explore the properties of GeoDataFrames • Transform (reproject) a GeoDataFrames • Make simple plots of GeoDataFrames • Create a GeoDataFrame from an existing ShapeFile • Understand the role the Fiona package plays in creating GeoDataFrames from various file formats • Create a GeoDataFrame from a GeoJSON file • Create a GeoDataFrame from an existing KML |
1b. Creating spatial dataframes using the ArcGIS API for Python |
ArcGIS Python API & Spatially Enabled Dataframes • Convert CSV files with coordinate fields into a Spatial Dataframe • Explore properties of spatial dataframes • Create a Spatial Dataframe from an existing feature class • Create a Spatial Dataframe from a feature layer service • Reproject a Spatial Dataframe • Make simple maps of spatial dataframe features |
2. Spatial analysis w/ GeoDataFrames | • Execute the “data science workflow” with a GeoPandas - Read data into a geodataframe (CSV and GeoJSON) - Explore the data: columns/column types, summaries, plots - Analyze the data… - Visualize results • Subset features in a geodataframe by attribute • Merge geodataframes • Dissolve geodataframe features based on an attribute value • Join attributes to a geodataframe • Spatially join data from one geodataframe to another • Generate various plots from single and multiple geodataframes • Saving a geodataframe to a feature class |
Section Prep
1. Create a new Conda environment (“gis
”) and install necessary packages
This section involves several new Python packages that we’ll have to install. As we can’t be sure these will conflict with existing packages, we’ll create a new Conda environment and install our packages in that environment. We’ll call this environment gis
to maintain consistency with my documentation and files.
Recall that to create a new environment, we first need to open your Python Command Prompt. From there, run the following commands. (You can skip the comment lines, there to explain what each command does…)
Note that the code below is slightly different than what is in the recordings.
# Create a fresh environment
conda create --clone arcgispro-py3 --name gis
# Activate the environment
activate gis
# Install geopandas using pip
pip install geopandas
# Install additional packages
conda install fiona
conda install -c conda-forge contextily -y
2. Fork and clone the SpatialDataframes
repository
The notebooks for this section are included in the repository https://github.com/env859/SpatialDataFrames.
- Fork the repository to your own GitHub account
- Clone the forked repository (using Git Bash or GitHub desktop) to your local machine.
3. Create a new Jupyter Notebooks shortcut and run it [optional].
The repository has a shortcut to run Jupyter Notebooks using your new environment, but it’s good practice to create one yourself.
-
Navigate into your newly cloned repository.
-
Create a new text file and rename it “RunJupyter.bat”.
-
Open the text file in a text editor and add the lines
@set the_env=gis call "C:\Program Files\ArcGIS\Pro\bin\Python\Scripts\activate.bat" "%the_env%" call "%localappdata%\ESRI\conda\envs\%the_env%\scripts\jupyter-notebook.exe" %cd% call "C:\Program Files\ArcGIS\Pro\bin\Python\Scripts\deactivate.bat"
- Line 1 creates a system variable named “the_env”, setting its value to “gis” (the name of the Conda environment created earlier).
- Line 2 activates this Conda environment. (Note the use of the variable set in line 1…)
- Line 3 executes the command to start Jupyter notebook stored in the Conda environments installation folder.
- Line 4 deactivates the Conda environment. This line will run once Jupyter is closed.
-
After this you should be good to go!
4. Open Jupyter and commence coding
The recording links below walk you through the code in the notebooks in your cloned workspace. The lessons are in the “lessons
” folder, and completed versions of the lessons are in the “complete
” folder.
Note: The environment created above has both the geopandas and arcgis package installed, so you don't need to switch environments; this is an update since the recordings
Recording links for 1a-Creating-spatial-data-frames-with-geopandas.ipynb
# | Topic | Learning Objectives | Recording Time |
---|---|---|---|
5.4.1.2 | Spatial Dataframes: CSV to GeoDataframe | • Python packages for spatial dataframes • Creating geometries from coordinate columns • Creating geodataframes from CSV files • Coordinate reference systems (CRS) and EPSG/WKID codes |
16:21 |
5.4.1.3 | Spatial Dataframes: Exploring GeoDataframes | • Geodataframes as Pandas dataframes • Revealing the CRS of a geodataframe • Transforming geodataframes to a new CRS • Plotting geodataframes |
6:53 |
5.4.1.4 | Spatial Dataframes: Other formats to GeoDataframe | • Reading shapefiles into geodataframes • Reading GeoJSON into geodataframes • Fiona drivers for other formats • Reading KML files into geodataframes • A word on ESRI geodatabase files • A word on ESRI’s Open Data Hubs |
15:47 |
Recording links for 1b-Creating-spatial-data-frames-with-ArcGIS-API.ipynb
Again, no need to switch Conda environments; the `gis` environment created above works fine
# | Topic | Learning Objectives | Recording Time |
---|---|---|---|
5.4.1.4 | Spatial Dataframes: CSV to Spatially Enabled Dataframes | • Difference between “geodataframes” (Geopandas) & “spatially enabled dataframes” (ArcGIS API for Python) • Creating SEDFs from CSV files • Creating SEDFs from Pandas dataframes • Exploring SEDFs |
11:37 |
5.4.1.5 | Spatial Dataframes: Other formats to SEDFs | • SEDFs from shapefiles • SEDFs from geopandas dataframes (not GeoJSON or KML) • Reading on-line datasets |
11:33 |
Recording links for 2a-Spatial-analysis-with-GeoPandas.ipynb
The recording includes the
append()
command for combining spatial dataframes. That has been deprecated, and you now use theconcat()
command, as is included in the notebooks.
# | Topic | Recording Time | |
---|---|---|---|
5.4.2.1 | Spatial Analysis: Loading and Exploring EV Data | 8:35 | |
5.4.2.2 | Spatial Analysis: Loading and Exploring Tract Data | 8:20 | |
5.4.2.3 | Spatial Analysis: Selecting by Attribute Values | 9:39 | |
5.4.2.4 | Spatial Analysis: Combining Geodataframes | 6:46 | |
5.4.2.5 | Spatial Analysis: Dissolving Features | 9:04 | |
5.4.2.6 | Spatial Analysis: Attribute Joins | 15:12 | |
5.4.2.7 | Spatial Analysis: Computing Geometric Attributes | 8:13 | |
5.4.2.8 | Spatial Analysis: Spatial Subsets - Intersections | 10:57 | |
5.4.2.9 | Spatial Analysis: Spatial Joins | 6:07 | |
5.4.2.10 | Spatial Analysis: Sharing Your Work | 2:55 |