Geoprocessing Workflows in Python

ENV 859 - Geospatial Data Analytics | Fall 2025 | Instructor: John Fay

Introduction & Learning Objectives

Having now covered the basics of Python, including how to work with built-in and 3rd party Python packages – including ArcPy – we are now ready to explore how spatial analysis can be done with Python. Using the Hurricane Mapping tool as a template, we’ll replicating the geoprocessing workflow we produced in the ArcGIS Pro geoprocessing modeler, but now in a fully transparent, fully reproducible Jupyter notebook. In doing so, we’ll cover the following:

Learning Objectives:

Hone basic Python coding skills by effectively interacting with the ESRI help documentation on ArcPy.
Develop scripting skills while recognizing that writing scripts is an iterative, non-linear process.
Create and maintain an organized coding workspace.
Import and apply the ArcPy package alongside other commonly used Python packages.
Identify and implement the correct syntax for ArcPy geoprocessing tools.
Define and manage coding variables, including pathnames to spatial datasets.
Use the pathlib Path module to create and manage relative paths.
Configure ArcPy environment variables using the arcpy.env module.
Work with and manage outputs from geoprocessing tools.
Execute multiple geoprocessing tools in sequence within a script.
Incorporate process update messages and map outputs into code for clarity and documentation.

✅ Task 1: Preparing the Workspace

As with all our spatial analysis tasks, everything begins with creating a tidy project workspace with subfolders to keep files organized. We will continue to place our data in a data folder, but separate data into “raw” and “processed” subfolders, with original, imported datasets going into the raw folder, and the results of our processing going into the processed folder. We’ll also create a “src” folder (short for “source”), which is convention for many software development projects and also adds to our workspace’s organization.

Create a project folder on your V: drive. Name it whatever you want, but be sure it includes no spaces or unusual characters.
Within this project folder, create data folders for your data:
- First create a “data “folder, and within this Data folder create folders called “raw” and “processed”
Within this project folder, create a folder for your code and then create an initial notebook file:
- Create a folder named “src”
- In this folder, create a new text file, renaming it “HurricaneTracker_v1.ipynb”
Download and unzip the North Atlantic IBTrACS point feature class into your Raw data folder.

Note the original link for these data is here. The link provided above is to the same data, but with the shapefile renamed as “IBTrACTS_NA.shp” as the original shapefile name, with its multiple “.” in it, causes issues with ArcGIS Pro and ArcPy. A metadata README file is included with this dataset.
Create a Readme.txt file in the project folder, and in this file include a short description of the project, your email, and the date.

Your workspace should now resemble this schematic:

    Project_folder/
         |
         ├ data/
         |   |
         |   ├ raw/
         |   |	├ IBTrACS_NA.dbf
         |   |	├ IBTrACS_NA.prj
         |   |	├ IBTrACS_NA.shp
         |   |	└ IBTrACS_NA.shx
         |   |
         |   └ processed/
         |
         ├ src/
         |   |
         |   └ HurricaneTracker_v1.ipynb
         |
         └ Readme.txt

✅ Task 2: Initialize your notebook and add a short description

One main advantage of Jupyter notebooks is to make our code easy to follow with the use of Markdown cells. So we often start with a Markdown cell that provides a little background to what the notebook will do. I also find adding a short description, sometimes even a bulleted workflow, allows me to keep focused on the coding task at hand.

Open your project folder in VSCode.
Open the Jupyter notebook file in the VSCode Editor.
Set the Kernel to use the arcgispro-py3 kernel.
Add a markdown cell, and in that cell add:
- A notebook title, in a large font
- A short description of what the code will do
- Your name and the date

✅ Task 3: Import packages

Scripts and notebooks using packages typically import those packages early on in the code. This practice allows others to see up front what packages are required before running any code.

Add a new code cell to your notebook.
Add a comment line indicating what this cell does: #Import packages
Import the ArcPy module: import arcpy
From the pathlib package, import the Path submodule: from pathlib import Path

✅ Task 4: Use ArcPy to subset the TrackPoint shapefiles

Similar to adding a new process to our ArcGIS Pro geoprocessing modeler, we’ll start by adding code to execute a single tool, in this case the Select tool. Here, of course, we can’t simply drag and drop the tool into our code; instead, we need to determine the proper syntax for the tool we want to use.

4.1 Get the syntax for the tool

You have many options for identifying the proper syntax of a tool, but the most reliable is the ArcGIS Pro help.

Open the ArcGIS Pro on-line help: https://pro.arcgis.com/en/pro-app/latest/help/main/welcome-to-the-arcgis-pro-app-help.htm
Click on the Tool Reference menu bar and expand the Geoprocessing Tools menu list on the left side.
Find the Select tool from the Analysis>Extract toolbox .
Navigate to the Parameter section and click the Python tab to expose the Python syntax for the Select tool.
Notes on ArcPy geoprocessing tools

A few important facets of the structure and syntax of all ArcPy geoprocessing tools:
- The tool names are preceded by “arcpy.” and then the name of the toolbox in which it is found: arcpy.analysis.Select.
- Geoprocessing tools can have a mix of required and optional parameters, the latter are encased in curly braces (“{}”).
- For tools that have a spatial dataset as an input or output parameter, we provide the path string to the dataset.

So, to execute the Select tool, we need to provide the input feature class (‘in_features’), the output feature class (‘out_feature_class’, and optionally the selecting SQL expression).

4.2 Code the tool

Add a new code cell and type in Select command, setting the parameters as follows:

Parameter	Value
in_features	The absolute path to the IBTrACS shapefile…
out_feature_class	The absolute path to your “`processed`” folder followed by “`\selected_points.shp`”
where_clause	`"SEASON = 2018 And NAME = 'FLORENCE'"`

While not necessary for code execution, I recommend making your code as legible as possible. This means: include the parameter names in your code, and enter each parameter on a separate line. You should also precede the code with a comment line for clarity.

Thus, your code would look something like:

#Select track points correspoding to a single storm 
arcpy.analysis.Select(
    in_features = "V:\\HurricaneMapper_arcpy\\data\\raw\\IBTrACS_NA.shp",
    out_feature_class = "V:\\HurricaneMapper_arcpy\\data\\processed\\selected_points.shp",
    where_clause = "SEASON = 2018 And NAME = 'FLORENCE'"
)

4.3 Test the tool

Now run your code. If all goes well, it should generate a message that looks something like the one shown below - and you should have a new feature class stored in the path string “memory\Trackpoints”.

Messages

Start Time: Wednesday, April 9, 2027 8:39:01 AM
Succeeded at Wednesday, Wednesday, April 9, 2027 8:39:03 AM (Elapsed Time: 1.52 seconds)

Debugging…

If your tool fails to run, you’ll have to debug your error. As you work more with Python and geoprocessing tools, your knack for debugging will improve as you’ll build a mental list of the more common sources of errors. For this, and most geoprocessing tools, the common errors are pesky typos in the tool name or its parameters:

If a geoprocessing tool cannot locate an input dataset from its path, then check the path. Some things to try:
- Add a new code cell and print the path as typed in your code:
  
  print("V:\\HurricaneTracker_arcpy\\data\\raw\\IBTrACS_NA.shp")
  
  Does it look correct? It’s easy to overlook the backslashes in paths…
- See if ArcPy can find the file with the arcpy.Exists() command:
  
  arcpy.Exists("V:\\HurricaneTracker_arcpy\\data\\raw\\IBTrACS_NA.shp")
  
  If the tool returns False then something is wrong with the path or the dataset itself.
The error might also be with the “where_clause”, which must follow SQL syntax. SQL can be confusing as to when you include quotes around the attribute you are selecting and when you don’t. (Note: single quotes go around text attributes and no quotes around numeric ones.) If in doubt, you can create the query in ArcGIS Pro using the non-SQL interface and see whether quotes are used or not.

Furthermore, coding SQL statements can be tricky because it is a string that often contains quotes.
- Check that the where_clause looks correct by printing it:
  
  print("SEASON = 2018 And NAME = 'FLORENCE'")
Other debugging approaches include dropping all optional parameters (here, the where_clause ) and see if the tool runs. Do anything you can to get the tool to run, and then amend the tool bit by bit to see where, precisely, the error occurs and try to identify what’s causing the error.

✅ Task 5: Streamline the Select tool

If you look at your Select tool, it runs, but it can be made more robust with a few changes. First, the storm season and name are hard coded in the tool; we’ll want those as user input variables, so we can amend that with storm_season and storm_name variables. Also, the paths used in the tool are absolute paths, and relative paths would enable our code to be run wherever we place our coding workspace. And finally, setting the output feature class to a variable will facilitate using this output as the input to subsequent geoprocessing tools.

So let’s streamline our tool to make it more robust and to facilitate later geoprocessing operations. First, however, we’ll take a moment to learn two useful objects for working with paths in ArcPy scripts: ArcPy’s env module, and the pathlib’s Path object.

5.1 ArcPy’s env module

Try running your script again and you’ll get an error that the output already exists. The obvious solution is to go to the processing folder and delete the output shapefile. But there a setting where we can tell ArcPy that it’s ok to overwrite outputs - and that setting is in the the ArcPy env module. This module is used to get and set various environment settings similar to where we set the default and scratch workspace in ArcGIS Pro.

A list of the settings and other operations you can access through arcpy.env is listed here. On this list, you’ll see a setting called OverwriteOutut. Setting that to True will enable us to run our Select tool repeatedly without having to manually delete the output - quite helpful when writing and debugging code.

5.1.1 Add code to allow overwriting output

Add a new code cell just below the one where you imported your packages.

In this code cell, insert the following code at the end:

#Allow arcpy to overwrite output
arcpy.env.overwriteOutput = True

Now run the Select tool again. It should work, overwriting any existing output layer!

We can also use arcpy.env to set the default and scratch workspace so that we can omit paths. Let’s see how this works.

5.1.2 Add code to set default paths to our data

In the same code cell you just created, add the following lines of code, replacing the path with the path to your raw folder.
```
#Set the default & scratch workspaces
arcpy.env.workspace = "V:\\HurricaneMapper_arcpy\\data\\raw"
```
Now, remove these paths from the in_features of your Select tool and re-run it.

Note: While we can set the arcpy.env.scratchWorkspace variable as well, ArcPy doesn’t use it as ArcGIS Pro does. This scratch workspace is only used for tools that output a folder, not a file. If we omitted the path from the Select tool’s out_feature_class parameter, the output would also go to the arcpy.env.workspace folder.

Full documentation of the arcpy.env module is here.

5.2 The `pathlib` package’s `Path` object

We still have an absolute path in our arcpy.env.workspace statement, meaning if we moved our workspace to a different folder, it wouldn’t work without editing the code to update the path. The Path object that we imported in the first code cell can help with this: the Path.cwd() will return the current working directory, which is the directory in which the script.

5.2.1 Explore the `Path` object

Create a new code cell for exploring the Path object. Place this between the 1st and 2nd code cells, i.e., between where we import packages and apply the ArcPy environment settings.
In this new code cell, add and run the code:

Path.cwd()

The Path.cwd() command returns the absolute path to the folder in which the notebook is. If we were to move our project workspace to the C:\Temp folder, this would return WindowsPath('c:/temp/HurricaneMapper_arcpy/src'). The object returns is a Path object that we can use to navigate out of (via its parent) or into (via backslashes), as well see next.
Now change the code to:

Path.cwd().parent

and run. You see what the parent of the folder is?
Next, change the code to:
```
raw_folder_path = Path.cwd().parent / 'data' / 'raw'
print(raw_folder_path)
print(raw_folder_path.exists())
print(type(raw_folder_path))
```
We’ve assigned the path to our raw folder to a variable. Note that we start with the current working folder (src), navigate to its parent (the project root folder), and then into the data folder and finally to the raw subfolder. The exists() command confirm that this is a valid path, and the type() command reminds us that the raw_folder variable is a Path object, not a string.
Remove the print statements, as they were for demonstration purposes only.
Add a comment above the code creating the raw_folder_path describing what the code is doing.

5.2.2 Create a variable pointing to the `processed` folder

In the same code cell above, add some more code to create a variable called processed_folder_path , setting its value to the processed folder path (just as you did for the raw folder…).

5.2.2 Set the `arcpy.env` working directory to a relative path.

Edit the code setting your ArcPy default workspace so that it points to the raw_folder_path object. Note, however, that you’ll first need convert raw_folder_path from a Path object to a string object, which can be done by the str() function: str(raw_folder_path)

5.3 Streamline the Select tool code with variables

Now we’ll improve our Select tool code by including variables for the where clause and using the memory workspace for the output. This will make our code both more robust and easier to modify if we want to change values like the storm selection criteria or tool inputs/outputs.

5.3.1 Set and use variables for selecting the storm

By pulling the storm season and storm name out as variables, it’s easier to locate and update these two values as opposed to editing the “where_clause” of the Select tool. Likewise, creating a variable for the Select tool’s output allows us to use its output in subsequent tools more easily. Lastly, we’ll set the output to be an “in-memory” feature class, which will speed up our tool’s execution, as this is an intermediate data layer.

Create a code cell above the one where the Select tool is run.
Create two variables: storm_season and storm_name, setting them to “2018” and “FLORENCE” respectively.
In the same code cell, add another variable named “selected_points”, setting its value to “memory\\selected_points”
In the code cell where the Select tool is run, alter the parameters:
- Set the out_feature_class to be the “selected_points” variable
- Using string formatting, modify the where_clause to incorporate the storm_season and storm_name variables.
Reset and run your script to check for issues.

5.4 [Optional] Use ArcPy’s `getCount()` to check your output

With our output being sent to memory, it’s hard to check whether the select tool worked properly. One way to be sure is to set the tool output to go to our processed folder and open it in ArcGIS Pro. Another, perhaps easier way is to use ArcPy’s GetCount() function to do a quick check.

Add a new code cell to your notebook below the one in which you apply the Select tool.

Add the code:

arcpy.management.GetCount(selected_points)

and run.

Hurricane Florence should have 156 points associated with it.
Try modifying your code to check with Hurricane Katrina in 2005, which should return 64 points.

✅ Task 6: Continue the workflow:

6.1 Convert the selected storm points to a storm track line

Repeat the steps above, but for the Points To Line tool (in the Data Management toolbox). You’ll need to look up the syntax for the tool in the ArcGIS Pro help.

Set the Input_Features parameter to the output of the Select tool (the variable created in Step 5.3.1).
Create a variable called storm_track to hold the Output_Feature_Class in the same code cell where you created a variable to hold the output of the Select tool. Set the variable’s value to be “memory\\Tracklines” - another memory layer.
Be sure to sort the line on the “ISO_TIME” field.

6.2 Select Counties that intersect the storm track line

We’ll now implement the Select Features By Location tool to select US counties that intersect our storm track line. Recall that the US Counties dataset is an on-line feature service, accessed by providing its URL, so we’ll have to investigate how that works in our geoprocessing tool.

Begin by reviewing the Python code for the Select Features By Location tool in the ArcGIS Pro documentation. Note that we’ll need to code two parameters for the tool: in_layer and select_features.
- The in_layer parameter will be the URL associated with our US Counties feature layer.
- The select features will be the variable referencing the storm track lines
We could hard-code the URL of the US Counties feature layer in the Select Features By Location tool and it would work fine. However, setting a variable to this URL earlier in our script – where we set other variables (e.g. for storm season/name and the Select and Points To Line outputs) – is good practice. Why? So that if the link breaks (as can certainly happen with on-line datasets) or if we want to re-use the US County’s layer somewhere else in our code, we can find an update it more easily, relative to having to search for the process where the URL is used.

So, in the code cell where you set variables for the storm season and name, add a new line where we’ll set a new variable called USCounties_lyr to the URL of the US Counties feature service:
```
`usa_counties = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0'
```
[The code should all be on a single line in your notebook.]

Now, add a new code cell at the end of your notebook and insert the code for the Select Features By Location:

#Process: Select by location
arcpy.management.SelectLayerByLocation(
    in_layer=usa_counties,
    select_features=storm_track
)

Run the code; debug if necessary

6.3 The ArcPy `result` object

While the Select Features By Location tool may have run successfully, this tool – unlike the Select and Points to Line tool which generate new feature classes – does not create any new output, only a virtual selection of records in the input layer. So what value or variable can we use to export the selected features? The answer lies in the the ArcPy result object.

All ArcPy geoprocessing tools generate a Result object when run. We can use this to save the [successful] result to a new variable, which can be used in subsequence tools or processes. This result object also includes any messages generated when the tool is executed.

Full documentation on the Results object is here: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/result.htm.

Modify your Select Layer By Location process so that it saves its result to a variable call “select_result”:

#Process: Select by location
select_result = arcpy.management.SelectLayerByLocation(
        in_layer=usa_counties,
        select_features=storm_track
	)

Use the Result object’s getOutput() function to save the one (and only) result generated from the process to a variable named selected_counties_lyr:
```
#Save the result to a variable
selected_counties_lyr = select_result.getOutput(0)
```
Feel free to explore other properties and methods of the select_result result object created. In fact, the getMessages() output can be helpful in debugging your process if it doesn’t work properly!

6.4 Copy the selected counties to a feature class

Our last step is to save the selected features to a shapefile in our processed folder. We’ll use the Copy Features tool for this. We’ll have to supply the filename for the output as the out_feature_class parameter, and you perhaps now have guess that we’d rather declare the pathname for our output not in the code for the tool itself, but earlier in our script so we can modify it more easily. And if we set that filename to the output feature class after we specify the storm season and name, we can use those values in the file we create.

At the end of the same code cell where you set your other variables, add code to set the output feature class of counties affected by the storm:
```
affected_counties = processed_folder / 'affected_counties.shp'
```
Now add a new code cell at the end of your notebook and add code to run the Copy Features tool, saving the output to the affected_counties variable.
- ⚠️Be sure to convert the affected_counties variable from a Path object to a String object with the str() function.
Run the tool and check that the file was created.

✅ Task 7 (Optional): View your output in your notebook

Here, we’ll examine an other ESRI Python package, the arcgis package, which has - among many other features - the capability to display feature classes in our Jupyter notebook. This is just for demonstration purposes, so we won’t adhere to “best coding practices”, such as importing packages early in our notebook. Instead, we’ll just add everything in one new code cell.

7.1 View the storm track and affected counties

If you are curious, documentation for this step is here: https://developers.arcgis.com/python/latest/guide/using-the-map-widget/

Add a new code cell at the end of your notebook.

Add the following code:

#Import the arcgis package
import arcgis
  
#Create a "gis" object
gis = arcgis.GIS()
  
#Create a map
my_map = gis.map()
  
#Set the basemap to oceans
my_map.basemap.basemap = "oceans"
  
#Create feature layers from the storm tracks affected_counties path
track_lyr = arcgis.GeoAccessor.from_featureclass(storm_track)
counties_lyr = arcgis.GeoAccessor.from_featureclass(str(affected_counties))
  
#Add the affected counties as a layer to the map
my_map.content.add(counties_lyr)
my_map.content.add(track_lyr)
  
#Show the map
my_map

7.2 Plot the number of people affected by the storm, broken down by state

Add a new code cell add the end of the notebook