layout: lesson2 title: Getting Started with ArcPy author: John Fay permalink: /python_gis/arcpy_getting_started.html sidebar: nav: python_gis
We’ll begin our journey working with ArcPy by running a Jupyter notebook inside ArcGIS Pro, examining how geoprocessing tools can be run via Python commands. From there, we’ll dive deeper into coding with geoprocessing tools, exploring how to code an entire geospatial workflow in a Jupyter notebook, and how this offers a powerful alternative to running tools within ArcGIS Pro.
Revisiting the Hurricane Mapping Tool - With ArcPy
Did you know Jupyter notebooks can be written and executed right within ArcGIS Pro? The interface is a bit different, and it behave slightly erratically at times, but it’s a useful place to begin our exploration of ArcPy and writing code with geoprocessing tools.
✅Create a Jupyter notebook inside ArcGIS Pro
- Open the
HurricaneMapper_arcpy.aprx
Project. - Add
IBTracts_NA.shp
to map. - From the
Analysis
menu, selectPython
>Python notebook
- In the Contents pane, rename the notebook “HurricaneMapper”.
- Note where this file is saved in your workspace…
✅Testing out the notebook
- Familiarize yourself with the controls for this notebook. It’s a bit different than standalone notebooks.
- Create and run a code cell to print “Hello World”.
- Create a markdown cell and add some formatted text to it. Make this the first code cell.
- Add a code cell with the code
x = 5
, and run it. - Close the notebook, then re-open it.
- Add a new code cell with the code
print(x)
. Does x retain its value?
✅Adding Geoprocessing Tools
- Run the Select tool to select IBTract_NA points from Helene in 2025, setting the output to go to an “in memory” feature class named “SelectedPoints”.
- Open the History Pane and drag the Select tool you just ran into the notebook. This will add the Python format of the geoprocessing tool into your notebook!
- Run the tool from within the Notebook to ensure it works.
- Run and add the Points to Line tool to your model.
- In the code, change the storm to “CHANTAL” in 2025 and re-run the entire notebook.
- CHALLENGE: See if you can add a code cell before the one with that runs the Select process, setting the storm season and name to variables. Then modify the other code cells to use those variables, naming the track line output with the storm name and season in the layer name (e.g. “CHANTAL_2025_Track”).
- Save your notebook.
😎Recap - Running Geoprocessing Tools
- We can create and run Jupyter Notebooks right from within ArcGIS Pro!
- Geoprocessing tools can be added to our notebooks after running by accessing the History.
- All ArcGIS Pro geoprocessing tools are available as Python functions, and like most other Python functions, running them is a matter of calling the function name and its parameters.
- ArcPy geoprocessing function names consist of three parts:
arcpy
followed by the toolbox nameanalysis
, and then the tool nameSelect
.- Different ArcPy functions have different sets of parameters.
- The output of an ArcPy function is often specified as a parameter of the tool, e.g.,
out_feature_class=r"memory\SelectedPoints"
So, running a geoprocessing tool in Python is mostly a matter of learning the proper syntax for the tool and constructing it within code. Using the History of a tool you’ve run is one way of getting the syntax, but it’s a bit cumbersome.
A deeper dive into running Geoprocessing tools
Let’s get back to working with a Jupyter notebook outside of ArcGIS Pro and look at more efficient ways of learning and executing geoprocessing tools in our code. Here, we’ll explore how to point our tools to datasets.
✅ArcPy in stand-alone Jupyter Notebooks
-
Make a duplicate of the notebook you created above. We’ll be making some changes, so we’ll keep the original as is.
-
Open your
HurricaneMapper_arcpy
workspace in VS code. -
Open the duplicated notebook and run it.
You’ll see the script runs into an error in the “Select” geoprocessing code block. Why? We need to import the
arcpy
package. -
Add a code block at the top of your notebook, and insert code to import the
arcpy
package:#Import packages import arcpy
The same code chunk still raises an error. It can’t find the input feature class. When run from ArcGIS Pro, it found the layer in the map’s table of contents. How, then, are inputs found outside of ArcGIS Pro??
✅Accessing datasets in arcpy
geoprocessing tools
Nearly every geoprocessing tool involves some sort of dataset (feature class, raster dataset, table, etc.) as an input. To access these datasets in our code, we provide the paths to the where datasets live on our machine. We have a few ways of specifying paths, and some are more robust than others. Let’s look at some examples.
🔶Absolute paths
We can provide the absolute location where the dataset lives, i.e., the full path including the drive letter. In my case, my HurricaneMapper_arcpy
lives on the V: drive, so the IBTracts_NA.shp
file lives at V:\HurricaneMapper_arcpy\Data\IBTracts_NA.shp
. We can use that in the Select geoprocessing tool.
-
Change the
arcpy.analysis.Select
tool as follows. Note that we double the slashes in the path and be sure you understand why!arcpy.analysis.Select( in_features="V:\\HurricaneMapper_arcpy\\Data\\IBTracs_NA.shp", out_feature_class=r"memory\SelectedPoints", where_clause=f"SEASON = 2024 And NAME = 'HELENE'" )
-
Run the tool. It should run, but you won’t see any output as it’s stored in memory. We’ll come back to that.
🔶Relative paths
Absolute paths work, but they only work when the data are found in that exact location; moving our coding workspace to, say, the C: drive would cause our code to break because the code would still be looking for the data on the V: drive. Instead, its good practice to use relative paths to refer to dataset. Here, paths are relative to a static starting point in our coding folder – usually the script itself. The pathlib
package’s Path
module comes in handy here.
-
Add
from pathlib import Path
to your first code cell, above or below the line where you imported arcpy. -
To see what our current working directory is, add a new code chunk below the one where you import the packages, and add the code:
Path.cwd()
and run it.
-
Change the code cell where you ran
Path.cwd()
tothe_file = Path.cwd() / 'data' / 'IBtracs_NA.shp' print(the_file)
The first line appends the folder and file name to the project folder, and saves this to the variable
the_file
. And the second line reveals that the path saved is actually an absolute path. However, now if we moved our project folder somewhere else, thePath.cwd()
portion would update to that location. -
It can be good practice to ensure that the path created actually points to where the file lives. Change the
print(the_file)
statement withthe_file.exists()
. That wil return True if the path points to an actual file. -
We are now almost ready to update the
arcpy.analysis.Select
tool to use the relative file. The one catch is that ourthe_file
variable is apathlib.WindowsPath
object, andarcpy
wants the path as a string. But that’s easy to change with thestr()
function.Update the code to the following:
arcpy.analysis.Select( in_features=str(Path.cwd() / 'data' / 'IBtracs_NA.shp'), out_feature_class=r"memory\SelectedPoints", where_clause=f"SEASON = 2024 And NAME = 'HELENE'" )
-
Run the code. You are likely to get an error:
ERROR 000725: Output Feature Class: Dataset memory\SelectedPoints already exists.
-
For now, in the same code cell where you imported
arcpy
add the following statement at the end. We’ll explain shortly what this does.arcpy.env.overwriteOutput = True
🔶Setting the workspace environment
Like ArcGIS Pro, you can set default environment settings such as the current workspace, scratch workspace, default coordinate reference system, etc. This is done through the arcpy.env
module. We just set one environment setting, the “overwriteOutput
” setting above, which enabled our coding environment to, as you likely guessed, overwrite output.
If we set the workspace
environment, then any time we don’t specify the full (absolute or relative path) to a dataset, arcpy
will look for it in the location we set to this environment. Let’s try it.
-
Just above where you set the
arcpy.env.overwriteOutput
to True, set theworkspace
environment to our data folder:arcpy.env.workspace = str(Path.cwd() / 'data')
-
Now we can just pass the filename, not the full path of the
IBTracs_NA.shp
file in the code cell:arcpy.analysis.Select( in_features=str('IBtracs_NA.shp'), out_feature_class=r"memory\SelectedPoints", where_clause=f"SEASON = 2024 And NAME = 'HELENE'" )
😎Recap - Datasets in Geoprocessing Tools
- Geoprocessing tools access datasets via their file pathnames, and we have a variety of ways of providing those pathnames. Relative paths, and
pathlib's
Path
module are useful in setting those relative paths - whether its to the file itself or to setarcpy
‘sworkspace
environment setting.- We also introduced
arcpy.env
as a way of getting and setting environment variables, which extends well beyond theworkspace
setting. See this resource for more information on that: Environment settings in Python
✅Tool Outputs in Scripts
You may have noticed that the products generated in geoprocessing tools, like inputs, are specified as file paths. We, of course, have been conveniently writing outputs to memory, but if we want to save the outputs to disk, we provide a path and filename for the file to be created. (Can you guess where the file would go if we only provided a filename, no path??).
Let’s try this with our code, writing the storm track to a file called “StormTrack.shp
” in the data
folder.
-
Replace the code in the Select code cell with:
arcpy.management.PointsToLine( Input_Features=r"memory\SelectedPoints", Output_Feature_Class=str(Path.cwd()/'data'/'StormTrack.shp'), Line_Field=None, Sort_Field="ISO_TIME", Close_Line="NO_CLOSE", Line_Construction_Method="CONTINUOUS", Attribute_Source="NONE", Transfer_Fields=None )
-
Run the tool, and should see a new shapefile added to your data folder.
That works, but we can improve our code by setting the filename to a variable so that we can reuse the code.
-
Change the code such that we create a variable (
storm_track
) for the output feature class and use that in our tool:storm_track = str(Path.cwd()/'data'/'StormTrack.shp') arcpy.management.PointsToLine( Input_Features=r"memory\SelectedPoints", Output_Feature_Class=storm_track, Line_Field=None, Sort_Field="ISO_TIME", Close_Line="NO_CLOSE", Line_Construction_Method="CONTINUOUS", Attribute_Source="NONE", Transfer_Fields=None )
Now we have the storm_track variable that points to the storm_track feature class, facilitating use in subsequent geoprocessing tools.
Alternatively, we can save the “result object” of a tool. See this section for greater detail on working with this, but here’s a code on how it works:
-
Change the code as follows:
result = arcpy.management.PointsToLine( Input_Features=r"memory\SelectedPoints", Output_Feature_Class="memory/storm_track", Line_Field=None, Sort_Field="ISO_TIME", Close_Line="NO_CLOSE", Line_Construction_Method="CONTINUOUS", Attribute_Source="NONE", Transfer_Fields=None )
Now we have a result object, from which we can get the output feature class.
-
Add a new code cell to store the output feature class as a variable.
storm_track = result.getOutput(0)
😎Recap - Tool Outputs
The outputs of geoprocessing tools are often new datasets that are created at the path+filename provided as one of the tool’s parameter. As we are likely to want to reference this output later in our script, it’s useful to store this path+filename as variable, which we can do explicitly before running the tool, or after the fact, by saving the result of the tool and extracting it from that object via the result object’s
.getOutput(0)
function.
✅More on the parameters associated with ArcPy geoprocessing tools
It’s a bit cumbersome to have to run a tool in ArcGIS Pro to expose its Python equivalent. Fortunately, between ArcGIS Pro’s documentation and VS Code’s intellitype functionality – and yes, AI chatbots and code assistants too – finding out how to correctly code a geoprocessing tool is not too challenging.
In our notebook, we next want to move onto the section where we extract the county features that intersect the storm track created. There’s a slight wrinkle in doing this in that our counties feature class is a web service layer, but this just means we have to run the Make Feature Layer tool before we can run the Select Layer By Location. And we’ll use this as an opportunity to learn and apply new geoprocessing tools in our notebook, starting with Make Feature Layer.
-
First, look up the syntax of the
Make Feature Layer
tool in the ArcGIS Pro documentation. A few ways to do this are:- Do a web search for the tool help page, e.g. search for “ArcGIS Pro Make Feature Layer”
- Open the help for the tool in ArcGIS Pro
- Browse the ArcGIS Pro help for the tool.
Either way, this should bring you to the following link. (Be sure the help page matches the version of ArcGIS Pro you are using.)
-
Scroll down to the Parameters section of the page and click on the Python tab to reveal the Python syntax for the tool. Here you see detailed explanations of each parameter, required and optional for the tool as well as some scripting examples.
-
Add a new code cell to your Jupyter notebook in VS code.
-
In this code cell, start typing the geoprocessing function, notice that intellitype starts helping you. And after you type the open parens (“
(
”) the tool syntax appears. -
Alternatively, you can copy the full tool syntax from the ArcPy help page and paste it into your code cell. I like to enter each parameter on a new line, and I can comment out optional commands I don’t want to specify:
arcpy.management.MakeFeatureLayer( in_features=, out_layer, #where_clause=, #workspace=, #field_info= )
-
Then populate the items:
in_features='https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0' out_layer='county_features'
-
And finally, save the output to a variable. The final code cell should be:
counties = arcpy.MakeFeatureLayer_management( in_features='https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0', out_layer='County_features' ).getOutput(0)
And now we have a variable “
counties_lyr
” that we can use as an input to the next process. -
-
Try on your own to add a code cell that applies the Select Layer By Location to select features in the county layer that intersect the storm track features. Save the result as a variable named “
selected_counties
” -
Try it again with the Copy Features tool, saving the output as “
Affected_Counties.shp
” in the data folder.
You’ve just replicated your Storm Tracking geoprocessing workflow in Python! You may well be thinking: that was a lot easier in ArcGIS Pro. Why all the effort in Python then? Well, in the next exercise, we’ll spice up our Notebook and explore some of the other capabilities of ArcPy as well as of what coding can do for us.