Geoprocessing Workflows in Python
Introduction & Learning Objectives
Having now covered the basics of Python, including how to work with built-in and 3rd party Python packages – including ArcPy – we are now ready to explore how spatial analysis can be done with Python. Using the Hurricane Mapping tool as a template, we’ll replicating the geoprocessing workflow we produced in the ArcGIS Pro geoprocessing modeler, but now in a fully transparent, fully reproducible Jupyter notebook. In doing so, we’ll cover the following:
Learning Objectives:
- Hone basic Python coding skills by effectively interacting with the ESRI help documentation on ArcPy.
- Develop scripting skills while recognizing that writing scripts is an iterative, non-linear process.
- Create and maintain an organized coding workspace.
- Import and apply the ArcPy package alongside other commonly used Python packages.
- Identify and implement the correct syntax for ArcPy geoprocessing tools.
- Define and manage coding variables, including pathnames to spatial datasets.
- Use the
pathlib
Path
module to create and manage relative paths. - Configure ArcPy environment variables using the
arcpy.env
module. - Work with and manage outputs from geoprocessing tools.
- Execute multiple geoprocessing tools in sequence within a script.
- Incorporate process update messages and map outputs into code for clarity and documentation.
✅ Task 1: Preparing the Workspace
As with all our spatial analysis tasks, everything begins with creating a tidy project workspace with subfolders to keep files organized. We will continue to place our data in a data
folder, but separate data into “raw
” and “processed
” subfolders, with original, imported datasets going into the raw
folder, and the results of our processing going into the processed
folder. We’ll also create a “src
” folder (short for “source”), which is convention for many software development projects and also adds to our workspace’s organization.
-
Create a project folder on your
V:
drive. Name it whatever you want, but be sure it includes no spaces or unusual characters. - Within this project folder, create data folders for your data:
- First create a “
data
“folder, and within thisData
folder create folders called “raw
” and “processed
”
- First create a “
- Within this project folder, create a folder for your code and then create an initial notebook file:
- Create a folder named “
src
” - In this folder, create a new text file, renaming it “
HurricaneTracker_v1.ipynb
”
- Create a folder named “
-
Download and unzip the North Atlantic IBTrACS point feature class into your
Raw
data folder.Note the original link for these data is here. The link provided above is to the same data, but with the shapefile renamed as “
IBTrACTS_NA.shp
” as the original shapefile name, with its multiple “.” in it, causes issues with ArcGIS Pro and ArcPy. A metadata README file is included with this dataset. - Create a Readme.txt file in the project folder, and in this file include a short description of the project, your email, and the date.
Your workspace should now resemble this schematic:
Project_folder/
|
├ data/
| |
| ├ raw/
| | ├ IBTrACS_NA.dbf
| | ├ IBTrACS_NA.prj
| | ├ IBTrACS_NA.shp
| | └ IBTrACS_NA.shx
| |
| └ processed/
|
├ src/
| |
| └ HurricaneTracker_v1.ipynb
|
└ Readme.txt
✅ Task 2: Initialize your notebook and add a short description
One main advantage of Jupyter notebooks is to make our code easy to follow with the use of Markdown cells. So we often start with a Markdown cell that provides a little background to what the notebook will do. I also find adding a short description, sometimes even a bulleted workflow, allows me to keep focused on the coding task at hand.
- Open your project folder in VSCode.
- Open the Jupyter notebook file in the VSCode Editor.
- Set the Kernel to use the
arcgispro-py3
kernel. - Add a markdown cell, and in that cell add:
- A notebook title, in a large font
- A short description of what the code will do
- Your name and the date
✅ Task 3: Import packages
Scripts and notebooks using packages typically import those packages early on in the code. This practice allows others to see up front what packages are required before running any code.
- Add a new code cell to your notebook.
- Add a comment line indicating what this cell does:
#Import packages
- Import the ArcPy module:
import arcpy
- From the
pathlib
package, import thePath
submodule:from pathlib import Path
✅ Task 4: Use ArcPy to subset the TrackPoint shapefiles
Similar to adding a new process to our ArcGIS Pro geoprocessing modeler, we’ll start by adding code to execute a single tool, in this case the Select
tool. Here, of course, we can’t simply drag and drop the tool into our code; instead, we need to determine the proper syntax for the tool we want to use.
4.1 Get the syntax for the tool
You have many options for identifying the proper syntax of a tool, but the most reliable is the ArcGIS Pro help.
-
Open the ArcGIS Pro on-line help: https://pro.arcgis.com/en/pro-app/latest/help/main/welcome-to-the-arcgis-pro-app-help.htm
-
Click on the Tool Reference menu bar and expand the Geoprocessing Tools menu list on the left side.
-
Navigate to the Parameter section and click the Python tab to expose the Python syntax for the Select tool.
Notes on ArcPy geoprocessing tools
A few important facets of the structure and syntax of all ArcPy geoprocessing tools:
- The tool names are preceded by “
arcpy.
” and then the name of the toolbox in which it is found:arcpy.analysis.Select
. - Geoprocessing tools can have a mix of required and optional parameters, the latter are encased in curly braces (“
{}
”). - For tools that have a spatial dataset as an input or output parameter, we provide the path string to the dataset.
- The tool names are preceded by “
So, to execute the Select tool, we need to provide the input feature class (‘in_features’), the output feature class (‘out_feature_class’, and optionally the selecting SQL expression).
4.2 Code the tool
Add a new code cell and type in Select command, setting the parameters as follows:
Parameter | Value |
---|---|
in_features | The absolute path to the IBTrACS shapefile… |
out_feature_class | The absolute path to your “processed ” folder followed by “\selected_points.shp ” |
where_clause | "SEASON = 2018 And NAME = 'FLORENCE'" |
While not necessary for code execution, I recommend making your code as legible as possible. This means: include the parameter names in your code, and enter each parameter on a separate line. You should also precede the code with a comment line for clarity.
Thus, your code would look something like:
#Select track points correspoding to a single storm
arcpy.analysis.Select(
in_features = "V:\\HurricaneMapper_arcpy\\data\\raw\\IBTrACS_NA.shp",
out_feature_class = "V:\\HurricaneMapper_arcpy\\data\\processed\\selected_points.shp",
where_clause = "SEASON = 2018 And NAME = 'FLORENCE'"
)
4.3 Test the tool
Now run your code. If all goes well, it should generate a message that looks something like the one shown below - and you should have a new feature class stored in the path string “memory\Trackpoints
”.
Messages
Succeeded at Wednesday, Wednesday, April 9, 2027 8:39:03 AM (Elapsed Time: 1.52 seconds)
Debugging…
If your tool fails to run, you’ll have to debug your error. As you work more with Python and geoprocessing tools, your knack for debugging will improve as you’ll build a mental list of the more common sources of errors. For this, and most geoprocessing tools, the common errors are pesky typos in the tool name or its parameters:
-
If a geoprocessing tool cannot locate an input dataset from its path, then check the path. Some things to try:
-
Add a new code cell and print the path as typed in your code:
print("V:\\HurricaneTracker_arcpy\\data\\raw\\IBTrACS_NA.shp")
Does it look correct? It’s easy to overlook the backslashes in paths…
-
See if ArcPy can find the file with the
arcpy.Exists()
command:arcpy.Exists("V:\\HurricaneTracker_arcpy\\data\\raw\\IBTrACS_NA.shp")
If the tool returns
False
then something is wrong with the path or the dataset itself.
-
-
The error might also be with the “
where_clause
”, which must follow SQL syntax. SQL can be confusing as to when you include quotes around the attribute you are selecting and when you don’t. (Note: single quotes go around text attributes and no quotes around numeric ones.) If in doubt, you can create the query in ArcGIS Pro using the non-SQL interface and see whether quotes are used or not.Furthermore, coding SQL statements can be tricky because it is a string that often contains quotes.
-
Check that the where_clause looks correct by printing it:
print("SEASON = 2018 And NAME = 'FLORENCE'")
-
-
Other debugging approaches include dropping all optional parameters (here, the
where_clause
) and see if the tool runs. Do anything you can to get the tool to run, and then amend the tool bit by bit to see where, precisely, the error occurs and try to identify what’s causing the error.
✅ Task 5: Streamline the Select tool
If you look at your Select tool, it runs, but it can be made more robust with a few changes. First, the storm season and name are hard coded in the tool; we’ll want those as user input variables, so we can amend that with storm_season
and storm_name
variables. Also, the paths used in the tool are absolute paths, and relative paths would enable our code to be run wherever we place our coding workspace. And finally, setting the output feature class to a variable will facilitate using this output as the input to subsequent geoprocessing tools.
So let’s streamline our tool to make it more robust and to facilitate later geoprocessing operations. First, however, we’ll take a moment to learn two useful objects for working with paths in ArcPy scripts: ArcPy’s env
module, and the pathlib
’s Path
object.
5.1 ArcPy’s env module
Try running your script again and you’ll get an error that the output already exists. The obvious solution is to go to the processing folder and delete the output shapefile. But there a setting where we can tell ArcPy that it’s ok to overwrite outputs - and that setting is in the the ArcPy env
module. This module is used to get and set various environment settings similar to where we set the default and scratch workspace in ArcGIS Pro.
A list of the settings and other operations you can access through arcpy.env
is listed here. On this list, you’ll see a setting called OverwriteOutut
. Setting that to True
will enable us to run our Select tool repeatedly without having to manually delete the output - quite helpful when writing and debugging code.
5.1.1 Add code to allow overwriting output
-
Add a new code cell just below the one where you imported your packages.
-
In this code cell, insert the following code at the end:
#Allow arcpy to overwrite output arcpy.env.overwriteOutput = True
-
Now run the Select tool again. It should work, overwriting any existing output layer!
We can also use arcpy.env
to set the default and scratch workspace so that we can omit paths. Let’s see how this works.
5.1.2 Add code to set default paths to our data
-
In the same code cell you just created, add the following lines of code, replacing the path with the path to your
raw
folder.#Set the default & scratch workspaces arcpy.env.workspace = "V:\\HurricaneMapper_arcpy\\data\\raw"
-
Now, remove these paths from the
in_features
of yourSelect
tool and re-run it.Note: While we can set the
arcpy.env.scratchWorkspace
variable as well, ArcPy doesn’t use it as ArcGIS Pro does. This scratch workspace is only used for tools that output a folder, not a file. If we omitted the path from the Select tool’sout_feature_class
parameter, the output would also go to thearcpy.env.workspace
folder.
Full documentation of the arcpy.env
module is here.
5.2 The pathlib
package’s Path
object
We still have an absolute path in our arcpy.env.workspace
statement, meaning if we moved our workspace to a different folder, it wouldn’t work without editing the code to update the path. The Path
object that we imported in the first code cell can help with this: the Path.cwd()
will return the current working directory, which is the directory in which the script.
5.2.1 Explore the Path
object
-
Create a new code cell for exploring the Path object. Place this between the 1st and 2nd code cells, i.e., between where we import packages and apply the ArcPy environment settings.
-
In this new code cell, add and run the code:
Path.cwd()
The
Path.cwd()
command returns the absolute path to the folder in which the notebook is. If we were to move our project workspace to theC:\Temp
folder, this would returnWindowsPath('c:/temp/HurricaneMapper_arcpy/src')
. The object returns is a Path object that we can use to navigate out of (via itsparent
) or into (via backslashes), as well see next. -
Now change the code to:
Path.cwd().parent
and run. You see what the
parent
of the folder is? -
Next, change the code to:
raw_folder_path = Path.cwd().parent / 'data' / 'raw' print(raw_folder_path) print(raw_folder_path.exists()) print(type(raw_folder_path))
We’ve assigned the path to our raw folder to a variable. Note that we start with the current working folder (
src
), navigate to its parent (the project root folder), and then into thedata
folder and finally to theraw
subfolder. Theexists()
command confirm that this is a valid path, and thetype()
command reminds us that the raw_folder variable is a Path object, not a string. -
Remove the print statements, as they were for demonstration purposes only.
-
Add a comment above the code creating the
raw_folder_path
describing what the code is doing.
5.2.2 Create a variable pointing to the processed
folder
- In the same code cell above, add some more code to create a variable called
processed_folder_path
, setting its value to the processed folder path (just as you did for theraw
folder…).
5.2.2 Set the arcpy.env
working directory to a relative path.
- Edit the code setting your ArcPy default workspace so that it points to the raw_folder_path object. Note, however, that you’ll first need convert
raw_folder_path
from a Path object to a string object, which can be done by thestr()
function:str(raw_folder_path)
5.3 Streamline the Select tool code with variables
Now we’ll improve our Select
tool code by including variables for the where clause and using the memory
workspace for the output. This will make our code both more robust and easier to modify if we want to change values like the storm selection criteria or tool inputs/outputs.
5.3.1 Set and use variables for selecting the storm
By pulling the storm season and storm name out as variables, it’s easier to locate and update these two values as opposed to editing the “where_clause” of the Select tool. Likewise, creating a variable for the Select tool’s output allows us to use its output in subsequent tools more easily. Lastly, we’ll set the output to be an “in-memory” feature class, which will speed up our tool’s execution, as this is an intermediate data layer.
- Create a code cell above the one where the Select tool is run.
- Create two variables:
storm_season
andstorm_name
, setting them to “2018
” and “FLORENCE
” respectively. - In the same code cell, add another variable named “
selected_points
”, setting its value to “memory\\selected_points
” - In the code cell where the Select tool is run, alter the parameters:
- Set the out_feature_class to be the “
selected_points
” variable - Using string formatting, modify the
where_clause
to incorporate thestorm_season
andstorm_name
variables.
- Set the out_feature_class to be the “
- Reset and run your script to check for issues.
5.4 [Optional] Use ArcPy’s getCount()
to check your output
With our output being sent to memory, it’s hard to check whether the select tool worked properly. One way to be sure is to set the tool output to go to our processed folder and open it in ArcGIS Pro. Another, perhaps easier way is to use ArcPy’s GetCount()
function to do a quick check.
-
Add a new code cell to your notebook below the one in which you apply the Select tool.
-
Add the code:
arcpy.management.GetCount(selected_points)
and run.
-
Hurricane Florence should have 156 points associated with it.
-
Try modifying your code to check with Hurricane Katrina in 2005, which should return 64 points.
✅ Task 6: Continue the workflow:
6.1 Convert the selected storm points to a storm track line
Repeat the steps above, but for the Points To Line tool (in the Data Management toolbox). You’ll need to look up the syntax for the tool in the ArcGIS Pro help.
-
Set the
Input_Features
parameter to the output of the Select tool (the variable created in Step 5.3.1). -
Create a variable called
storm_track
to hold theOutput_Feature_Class
in the same code cell where you created a variable to hold the output of the Select tool. Set the variable’s value to be “memory\\Tracklines
” - another memory layer. -
Be sure to sort the line on the “ISO_TIME” field.
6.2 Select Counties that intersect the storm track line
We’ll now implement the Select Features By Location tool to select US counties that intersect our storm track line. Recall that the US Counties dataset is an on-line feature service, accessed by providing its URL, so we’ll have to investigate how that works in our geoprocessing tool.
-
Begin by reviewing the Python code for the Select Features By Location tool in the ArcGIS Pro documentation. Note that we’ll need to code two parameters for the tool:
in_layer
andselect_features
.- The
in_layer
parameter will be the URL associated with our US Counties feature layer. - The
select features
will be the variable referencing the storm track lines
- The
-
We could hard-code the URL of the US Counties feature layer in the Select Features By Location tool and it would work fine. However, setting a variable to this URL earlier in our script – where we set other variables (e.g. for storm season/name and the Select and Points To Line outputs) – is good practice. Why? So that if the link breaks (as can certainly happen with on-line datasets) or if we want to re-use the US County’s layer somewhere else in our code, we can find an update it more easily, relative to having to search for the process where the URL is used.
So, in the code cell where you set variables for the storm season and name, add a new line where we’ll set a new variable called
USCounties_lyr
to the URL of the US Counties feature service:`usa_counties = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0'
[The code should all be on a single line in your notebook.]
-
Now, add a new code cell at the end of your notebook and insert the code for the
Select Features By Location
:#Process: Select by location arcpy.management.SelectLayerByLocation( in_layer=usa_counties, select_features=storm_track )
-
Run the code; debug if necessary
6.3 The ArcPy result
object
While the Select Features By Location tool may have run successfully, this tool – unlike the Select and Points to Line tool which generate new feature classes – does not create any new output, only a virtual selection of records in the input layer. So what value or variable can we use to export the selected features? The answer lies in the the ArcPy result
object.
All ArcPy geoprocessing tools generate a Result object when run. We can use this to save the [successful] result to a new variable, which can be used in subsequence tools or processes. This result object also includes any messages generated when the tool is executed.
Full documentation on the Results object is here: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/result.htm.
-
Modify your Select Layer By Location process so that it saves its result to a variable call “
select_result
”:#Process: Select by location select_result = arcpy.management.SelectLayerByLocation( in_layer=usa_counties, select_features=storm_track )
-
Use the Result object’s
getOutput()
function to save the one (and only) result generated from the process to a variable namedselected_counties_lyr
:#Save the result to a variable selected_counties_lyr = select_result.getOutput(0)
-
Feel free to explore other properties and methods of the
select_result
result object created. In fact, thegetMessages()
output can be helpful in debugging your process if it doesn’t work properly!
6.4 Copy the selected counties to a feature class
Our last step is to save the selected features to a shapefile in our processed
folder. We’ll use the Copy Features tool for this. We’ll have to supply the filename for the output as the out_feature_class
parameter, and you perhaps now have guess that we’d rather declare the pathname for our output not in the code for the tool itself, but earlier in our script so we can modify it more easily. And if we set that filename to the output feature class after we specify the storm season and name, we can use those values in the file we create.
-
At the end of the same code cell where you set your other variables, add code to set the output feature class of counties affected by the storm:
affected_counties = processed_folder / 'affected_counties.shp'
-
Now add a new code cell at the end of your notebook and add code to run the Copy Features tool, saving the output to the
affected_counties
variable.- ⚠️Be sure to convert the
affected_counties
variable from a Path object to a String object with thestr()
function.
- ⚠️Be sure to convert the
-
Run the tool and check that the file was created.
✅ Task 7 (Optional): View your output in your notebook
Here, we’ll examine an other ESRI Python package, the arcgis
package, which has - among many other features - the capability to display feature classes in our Jupyter notebook. This is just for demonstration purposes, so we won’t adhere to “best coding practices”, such as importing packages early in our notebook. Instead, we’ll just add everything in one new code cell.
7.1 View the storm track and affected counties
If you are curious, documentation for this step is here: https://developers.arcgis.com/python/latest/guide/using-the-map-widget/
-
Add a new code cell at the end of your notebook.
-
Add the following code:
#Import the arcgis package import arcgis #Create a "gis" object gis = arcgis.GIS() #Create a map my_map = gis.map() #Set the basemap to oceans my_map.basemap.basemap = "oceans" #Create feature layers from the storm tracks affected_counties path track_lyr = arcgis.GeoAccessor.from_featureclass(storm_track) counties_lyr = arcgis.GeoAccessor.from_featureclass(str(affected_counties)) #Add the affected counties as a layer to the map my_map.content.add(counties_lyr) my_map.content.add(track_lyr) #Show the map my_map
7.2 Plot the number of people affected by the storm, broken down by state
-
Add a new code cell add the end of the notebook
-
Add the following code:
#Create a plot of people affected by state the_plot = ( counties_lyr. groupby('STATE_NAME') .agg({'POPULATION':'sum'}) .sort_values(by='POPULATION',ascending=False) .plot( kind='barh', title=f'{storm_name}-{storm_season}', legend=False, xlabel='People affected', ylabel='' ) )
7.3 Clear and run your entire notebook with a different storm
- Change the variables to run the notebook with Chantal in 2025
- It’s best, before running to clear all outputs and restart
7.4 Export your notebook as an HTML file
- In VS Code, select the ellipses in the right edge of the notebook’s top menu and select
Export
- Select
HTML
as the export format and save the html to your project folder. - View the saved document in your favorite web browser. You’ll see all the code and plot, but unfortunately, not the map.
- If you want to save the map (as its own HTML document), you can replace the last line in that code cell from
my_map
tomy_map.export_to_html('my_map.html')
Recap
You have now create a fully reproducible workflow for your Hurricane Tracking tool!