Automating GIS Workflows in Python
Introduction & Objectives
In our previous lesson, we built a Python-based geoprocessing workflow using ArcPy. The result — a Jupyter Notebook that combined the IBTrACS storm archive and U.S. county features — produced a feature class and summary statistics for a single storm season and name. Along the way, we learned how to structure code clearly and make our analysis reproducible and adaptable.
But reproducibility is just the beginning. What if we wanted to process every storm in the IBTrACS archive, or schedule the workflow to run automatically as new data becomes available? Rather than running our notebook manually, we can design our Python code to run itself — repeatedly, reliably, and even unattended.
In this lesson, we’ll move from interactive analysis to automation: transforming our notebook logic into a Python script that can batch-process many storms or run as a custom ArcGIS Pro tool. Along the way, you’ll see how Python empowers GIS professionals to think beyond single analyses — to engineer workflows that scale, integrate, and adapt to real-world data challenges.
Learning Objectives
Through this exercise you should achieve the following learning objectives.
- Differentiate between an interactive geoprocessing workflow (e.g., Jupyter Notebook) and an automated Python script designed for repeated or scheduled execution.
- Explain how automation enhances reproducibility, scalability, and efficiency in geospatial workflows.
- Refactor a Jupyter Notebook–based ArcPy workflow into a stand-alone Python script suitable for batch processing.
- Implement looping and parameterization in Python to process multiple spatial datasets (e.g., storms across seasons).
- Incorporate error handling and logging to make automated workflows robust and reliable.
- Design a Python script that can be used both independently and as an ArcGIS Pro script tool with user-defined parameters.
- Demonstrate how to schedule or trigger automated workflows using Windows Task Scheduler or ArcGIS Pro task automation options.
- Evaluate when and why to transition from manual or semi-automated analysis to fully automated geoprocessing systems in professional GIS environments.
Central Task
You are the lead analysts on a team that wants to study trends in North Atlantic storms over time. Specifically, you have been tasked with determining how many counties in each state was impacted by a storm across the years 2000-2024. The table should look like this:
| Storm Season | Counties_Impacted |
|---|---|
| 2000 | <The number of counties intersecting all storm tracts in 2000> |
| 2001 | <The number of counties intersecting all storm tracts in 2001> |
| … | … |
We will make the assumption that a script that can produce a CSV file containing the information of the above list will be sufficient. However, we may also explore code that presents a plot of the number of storm-affected counties per year as generated from the IBTrACS data.
Approach: Geoprocessing Workflow to Automation
We’ve already written a bulk of the code we’ll need to for this objective when we crafted our geoprocessing workflow notebook. And our effort at writing our code cleanly, with variables set clearly in the front of our code, will pay dividends as we leverage that code here. We’ll still have to make some changes and add a bit of extra code (e.g. to generate a list of storms), but this exercise will illustrate the utility of writing clear code!
Pseudocode
Below is the pseudocode for how we’ll achieve our objective. As always, we’ll break the tasks down into manageable steps, ensuring each one runs as expected and without error. And per usual, we’ll pause and learn a bit more about Python and coding as we go, and perhaps even discover more efficient ways of accomplishing these tasks, but that is the nature of learning how to code: start with a plan (pseudocode) and tackle that plan!
- Iterate over the years 2000-2024
- For a given year:
- Generate a collection of all named storms for that year from the IBTrACTS point feature class.
- For each storm in this list:
- Select the points corresponding to that storm
- Convert points to lines
- Select county features intersecting the storm track
- Count the number of county features
- Add year table to a list of tables (that will be merged)
- Merge the yearly tables into a single table
- [Optional] Plot the time series of # of counties affected by storms across years.
Plan of attack (and missing pieces)
With pseudocode as our “blueprint”, we’ll now map out a “plan of attack”. We already have code that can produce a feature class of the counties intersecting a given storm, identified by its season and name, but we’ll still need to figure out how to:
- Convert our existing Jupyter notebook into a Python script
- Add code that tabulates the number of counties affected in a given storm
- Add code to accumulate the count of counties affected across all storms in a given year
- Add code to generate a list of storms for a given year from the IBTrACS point feature class
- Iterate through seasons, then through each storm in a given season and compute the total # of counties in each season
- Possibly create a plot showing trends in the the number of affected counties over time
😎Remember, when tackling a coding script, it’s best to break it down into simple steps, execute those steps, and gradually add complexity!
Coding/Debugging Tips
We’ve discussed how you can run through your code in debugger mode, which is effective, but can be cumbersome. What is also helpful is using the print() statement to indicate the status of your script. This is good not only to monitor the progress of your code, but also for seeing exactly where things might go wrong. And if you print the values of key variables, you may get some insight into whether your code is producing expected results.
Tasks
🟦Task 1. Fork and clone the HurricaneMapper_PY repository
We will return to using Git and GitHub in this exercise, as we can always use more experience with this technology. I have created a base repository in the ENV859 GitHub account. You can view (pull) but not update (push to) this repository, so instead you will make your own copy of it through a process called “forking”. This process makes and exact replica of the repository under your GitHub account, one that you can modify to your heart’s content. It also maintains a link to its parent repository, which can facilitate interaction with this parent. (We will discuss in class what types of interactions these might be.)
✅1.1 How to fork the repository:
- Log into your GitHub account in any web browser.
- Navigate to the
HurricaneMapper_PYrepository: https://github.com/ENV859/HurricaneMapper_PY in your browser. - Towards the top of the repository page, find a button that says
Forkand click it. - In the page that appears to create the new fork, accept the defaults and click
Create fork. - You now have a forked copy of the class repository.
✅1.2 Clone the forked repository to your local machine using VS Code
[VS code on your machine should already be linked to your GitHub account. If not see this link.]
- Open VS Code. If it opens an active workspace, close that folder
- From the action bar on the left, activate the Source Control menu
- Select Clone Repository, then Clone from GitHub in the option that appears.
- Find or paste the repository URL as the Repository name.
- Clone the repository to your V: drive.
You know have a clone of the forked repository on your local machine. A few more tweaks and you are ready to begin coding!
✅1.3 Customize the workspace
-
Edit the README file so it has your name and that this is a forked repository, including the URL of the source repository.
-
Navigate to the
data/rawfolder and unzip theIBTrACS_NA.zipfile.🤔Some questions we will address in class:
- Why not just include the raw shapefile instead of having the user unzip the file?
- What is the
.gitingorefile in the project’s root folder? - What is the
.gitkeepfile in thedata/processedfolder?
Ok, now we are ready to begin coding!
🟦Task 2. Convert your existing Hurricane Tracking notebook to a Python script
Notebooks are great for sharing our code and its output with others, but Python scripts are more robust for automating tasks. They also work better when coding with GitHub because the text is pure code, allowing us to track changes much more easily. So we are going to be coding a Python script. However, we still want to leverage the code we crafted in our notebook. We could simply copy and past from our notebook into our script, but we’ll explore VS Code’s export tool.
✅2.1 How to export your notebook to a script:
-
Open your notebook in VS Code
-
In the toolbar towards the top of the notebook, click on the ellipses (
...) and selectExport -
Export your notebook as a Python script.
-
Save the generated script as
HurricaneCounter.py -
Add the script to the Source Control staging area, and commit the change to your local Git repository with an appropriate commit message.
You’ll see that your script contains the notebook’s markdown as comments and the code presented in code cells (
#%%). You may wish to edit either of these to make your code more reasonable. Do as you wish; we’ll focus on the code itself, not how it’s organized.
🟦Task 3. Code to count the number of counties per storm
If your coding objective involves iteration, it’s always good to start with nailing down the code that you are going to iterate - and then iterate through it. For our objective, involves changing our geoprocessing flow so that it doesn’t produce a feature class of affected counties, just a list of how many counties were affected by that storm. So let’s focus here on that goal: how to generate a simple count of counties affected by a single storm.
It turns out its quite simple: we simply need to apply the GetCount() tool on the counties intersecting the storm track. In fact, we can eliminate the code to copy the selected features to a new feature class. The trickiest part may be how to access the result of the GetCount() tool and store as a variable (and then what do do with that variable), but we have already dealt with working with result objects. So, here’s what to do.
✅ 3.1 Getting the count of counties affected by a specific storm
- Delete the code cell at the very end of your
HurricaneCounter.pyscript, the cell that uses theCopyFeatures()tool to copy the selected counties to an about feature class. - To keep your code tidy, also delete the line in your code where you set the output feature class name:
` affected_counties = processed_folder_path / ‘affected_counties.shp’` - Stage and commit your changes to Git.
- Add a new code cell (or just code, if you are not working with code cells) at the end of your script. In this chunk, add code that executes the
GetCount()tool (in themanagementtoolbox) to theselect_resultvariable, which points to the in-memory feature class of counties intersecting the storm track.- Be sure to save the result of this process to a variable, using the
.getOutput(0)function to extract the actual count of counties. - The default will be the value, but as a string variable. Use the int() function to convert this to an integer.
- Be sure to save the result of this process to a variable, using the
- Test your code. The value of your variable for “HELENE” in 2024 should reveal 55 counties were impacted.
- If successful, stage and commit your changes.
🟦Task 4. Creating lists of named storms for each storm season
We now have code that can take a storm, defined by its season and name, and compute the number of counties affected by it. Our next focus is to compute the sum if the number of affected counties across all named storms in a given season. To do this, however, we need to generate a list of those names - names which change from year to year. And that is our objective for this task: how to list all the storms in a given season.
✅ 4.1 Generating a list for a single season
We’ll start with generating code for a single season, and then we’ll worry about iterating this code through all seasons.
We’ll do this with a SearchCursor: We can provide a “where clause” to subset records (e.g. for a specific year or “season”) when we create the cursor, and then we can iterate through the each selected record and add storm names to a list. We’ll want to do this above the code were we use storm seasons and names to select and process IBTrACS points, but after code where we assign a variable to the location of the IBTrACS.shp feature class. So insert a new code cell around line 32. In this code chunk:
- Create an empty list that will hold the names for that season.
- Create a search cursor object from the IBTrACS feature class, setting the where_clause to
'SEASON = 2000', and extracting just theNAMEfield. - Iterate through all records in the cursor, and add the storm NAME to the list, but only if [1] that name doesn’t already exist in the list and [2] the name is not “UNNAMED” (as we only want to process “named” storms.)
- Delete the cursor.
- Run your code and check your list. You should see that there are 14 named storms in the 2000 season.
- Stage and commit your code.
✅ 4.2 Extending our code to create lists for each season
With the code above in place, we’ll now iterate through all the years in our study (2000-2024) and generate a list for each. We’ll want to refer to these lists later in our code, so we’ll store these lists in a dictionary: the key will be the season, and the value will be the list of storm names found in that season.
We’ll add this code to the code cell we create in Step 4.1.
-
Begin by creating and empty dictionary that will hold lists of names keyed by the season.
-
Below that, begin a for loop to iterate through the range of seasons we want to analyze.
😎Tip: While writing and testing your code, it’s best to just do a few years, not the full set, so you don’t have to wait as long to check your output.
-
Indent the code we created in Step 4.1 that generates the list of names for a given storm so that is becomes the code chunk that is run within the for loop.
-
Also, modify the where clause in the code that creates the search cursor so that it selects for the season specified in the for loop, not just “2000”.
-
At the end of the for loop code chunk, add a line of code to add an item to your dictionary with the key set to the season and the value set to the list of storms in that season.
-
Run your code and check that the resulting dictionary appears as expected. Debug as needed.
-
Stage and commit your revised code.
🟦Task 5. Process all seasons, all storms
Now that we our dictionary of seasons and storms, we can set up iterations to loop through each season and tally the total number of counties affected in that season. We’ll then add code that writes these totals to a CSV.
✅Task 5a. Iterate through years and then storms, computing the total # of counties affected each season
This code will come after we created the dictionary and will include our sequence of geoprocessing commands that select points for a given storm, convert to a line, select counties that intersect that line, and finally compute the number of counties selected.
-
First, create a Python file object, in write mode, specifying a csv file stored in our
data/processedfolder that we’ll create to store our output.😎Tip: You may want to specify the file name & path for this output file early in your script so that it’s more easily modifiable.
-
Write a header row to this file object:
<file_object>.write('season, counties_affected\n')
(Don’t forget the\nat the end to add a new line!) -
Create a for loop that iterates through each key in the dictionary created in Step 4.2
-
Within the loop of each key/season:
- Extract the list of storms from that dictionary into a new variable
- Initialize an integer variable (i.e. set it to 0) which will accumulatively tally the number of counties affected by storms in that season.
- Create a for loop that iterates through each storm name
- Within the loop of each storm name:
- [Optional] Add a line that prints the storm season and storm name, so monitor progress as your script runs
- Indent the geoprocessing code that selects, connects, intersects, and tallies counties affected by the storm. Be sure the code selects the appropriate storm, given the variables used in both your for loops.
- Add the # of counties affected to the tally of counties affected for the current season
- Before moving to the next season, write a new line to your output file that includes the season and the count of counties. Again don’t forget to add a new line character (
\n) to the end of the line.
-
Once all seasons have be run, delete the file object to close the file.
-
Run your code and check the output file, opening it in a text editor or Excel.
-
Stage and commit your script. Finished!
Recap
You’ve now leveraged your knowledge of basic Python (variables, loops, file objects, etc.) with your knowledge of the ArcPy package to automate a Python task that may not even be possible with ArcGIS Pro’s geoprocessing modeler alone. By developing a geoprocessing workflow and wrapping into a few loops, you take the work out of your hands and put it onto the computer. This example, of course, is a very basic one, but the principles are the same with more complex ones. And down the road, if you continue with this, you will likely look into parallelizing your code, meaning splitting out intensive processes to be executed, in parallel across different processors or different machines. That’s not something we’ll cover in this class, but you now at least see its utility!
Up next, we are going to see how additional Python packages facilitate geoprocessing and automation even more!