ENV 859

layout: lesson2 title: Getting Started with ArcPy author: John Fay permalink: /python_gis/arcpy_getting_started.html sidebar: nav: python_gis

We’ll begin our journey working with ArcPy by running a Jupyter notebook inside ArcGIS Pro, examining how geoprocessing tools can be run via Python commands. From there, we’ll dive deeper into coding with geoprocessing tools, exploring how to code an entire geospatial workflow in a Jupyter notebook, and how this offers a powerful alternative to running tools within ArcGIS Pro.

Revisiting the Hurricane Mapping Tool - With `ArcPy`

Did you know Jupyter notebooks can be written and executed right within ArcGIS Pro? The interface is a bit different, and it behave slightly erratically at times, but it’s a useful place to begin our exploration of ArcPy and writing code with geoprocessing tools.

✅Create a Jupyter notebook inside ArcGIS Pro

Open the HurricaneMapper_arcpy.aprx Project.
Add IBTracts_NA.shp to map.
From the Analysis menu, select Python>Python notebook
In the Contents pane, rename the notebook “HurricaneMapper”.
Note where this file is saved in your workspace…

✅Testing out the notebook

Familiarize yourself with the controls for this notebook. It’s a bit different than standalone notebooks.
Create and run a code cell to print “Hello World”.
Create a markdown cell and add some formatted text to it. Make this the first code cell.
Add a code cell with the code x = 5, and run it.
Close the notebook, then re-open it.
Add a new code cell with the code print(x). Does x retain its value?

✅Adding Geoprocessing Tools

Run the Select tool to select IBTract_NA points from Helene in 2025, setting the output to go to an “in memory” feature class named “SelectedPoints”.
Open the History Pane and drag the Select tool you just ran into the notebook. This will add the Python format of the geoprocessing tool into your notebook!
Run the tool from within the Notebook to ensure it works.
Run and add the Points to Line tool to your model.
In the code, change the storm to “CHANTAL” in 2025 and re-run the entire notebook.
CHALLENGE: See if you can add a code cell before the one with that runs the Select process, setting the storm season and name to variables. Then modify the other code cells to use those variables, naming the track line output with the storm name and season in the layer name (e.g. “CHANTAL_2025_Track”).
Save your notebook.

😎Recap - Running Geoprocessing Tools

We can create and run Jupyter Notebooks right from within ArcGIS Pro!

Geoprocessing tools can be added to our notebooks after running by accessing the History.

All ArcGIS Pro geoprocessing tools are available as Python functions, and like most other Python functions, running them is a matter of calling the function name and its parameters.

ArcPy geoprocessing function names consist of three parts: arcpy followed by the toolbox name analysis, and then the tool name Select.

Different ArcPy functions have different sets of parameters.

The output of an ArcPy function is often specified as a parameter of the tool, e.g., out_feature_class=r"memory\SelectedPoints"

So, running a geoprocessing tool in Python is mostly a matter of learning the proper syntax for the tool and constructing it within code. Using the History of a tool you’ve run is one way of getting the syntax, but it’s a bit cumbersome.

A deeper dive into running Geoprocessing tools

Let’s get back to working with a Jupyter notebook outside of ArcGIS Pro and look at more efficient ways of learning and executing geoprocessing tools in our code. Here, we’ll explore how to point our tools to datasets.

✅ArcPy in stand-alone Jupyter Notebooks

Make a duplicate of the notebook you created above. We’ll be making some changes, so we’ll keep the original as is.
Open your HurricaneMapper_arcpy workspace in VS code.
Open the duplicated notebook and run it.

You’ll see the script runs into an error in the “Select” geoprocessing code block. Why? We need to import the arcpy package.
Add a code block at the top of your notebook, and insert code to import the arcpy package:
```
#Import packages
import arcpy
```
The same code chunk still raises an error. It can’t find the input feature class. When run from ArcGIS Pro, it found the layer in the map’s table of contents. How, then, are inputs found outside of ArcGIS Pro??

✅Accessing datasets in `arcpy` geoprocessing tools

Nearly every geoprocessing tool involves some sort of dataset (feature class, raster dataset, table, etc.) as an input. To access these datasets in our code, we provide the paths to the where datasets live on our machine. We have a few ways of specifying paths, and some are more robust than others. Let’s look at some examples.

🔶Absolute paths

We can provide the absolute location where the dataset lives, i.e., the full path including the drive letter. In my case, my HurricaneMapper_arcpy lives on the V: drive, so the IBTracts_NA.shp file lives at V:\HurricaneMapper_arcpy\Data\IBTracts_NA.shp. We can use that in the Select geoprocessing tool.

Change the arcpy.analysis.Select tool as follows. Note that we double the slashes in the path and be sure you understand why!

arcpy.analysis.Select(
    in_features="V:\\HurricaneMapper_arcpy\\Data\\IBTracs_NA.shp",
    out_feature_class=r"memory\SelectedPoints",
    where_clause=f"SEASON = 2024 And NAME = 'HELENE'"
)

Run the tool. It should run, but you won’t see any output as it’s stored in memory. We’ll come back to that.

🔶Relative paths

Absolute paths work, but they only work when the data are found in that exact location; moving our coding workspace to, say, the C: drive would cause our code to break because the code would still be looking for the data on the V: drive. Instead, its good practice to use relative paths to refer to dataset. Here, paths are relative to a static starting point in our coding folder – usually the script itself. The pathlib package’s Path module comes in handy here.

Add from pathlib import Path to your first code cell, above or below the line where you imported arcpy.
To see what our current working directory is, add a new code chunk below the one where you import the packages, and add the code:
```
Path.cwd()
```
and run it.
Change the code cell where you ran Path.cwd() to
```
the_file = Path.cwd() / 'data' / 'IBtracs_NA.shp'
print(the_file)
```
The first line appends the folder and file name to the project folder, and saves this to the variable the_file. And the second line reveals that the path saved is actually an absolute path. However, now if we moved our project folder somewhere else, the Path.cwd() portion would update to that location.
It can be good practice to ensure that the path created actually points to where the file lives. Change the print(the_file) statement with the_file.exists(). That wil return True if the path points to an actual file.
We are now almost ready to update the arcpy.analysis.Select tool to use the relative file. The one catch is that our the_file variable is a pathlib.WindowsPath object, and arcpy wants the path as a string. But that’s easy to change with the str() function.

Update the code to the following:
```
arcpy.analysis.Select(
    in_features=str(Path.cwd() / 'data' / 'IBtracs_NA.shp'),
    out_feature_class=r"memory\SelectedPoints",
    where_clause=f"SEASON = 2024 And NAME = 'HELENE'"
)
```
Run the code. You are likely to get an error:
ERROR 000725: Output Feature Class: Dataset memory\SelectedPoints already exists.
For now, in the same code cell where you imported arcpy add the following statement at the end. We’ll explain shortly what this does.
```
arcpy.env.overwriteOutput = True
```

🔶Setting the workspace environment

Like ArcGIS Pro, you can set default environment settings such as the current workspace, scratch workspace, default coordinate reference system, etc. This is done through the arcpy.env module. We just set one environment setting, the “overwriteOutput” setting above, which enabled our coding environment to, as you likely guessed, overwrite output.

If we set the workspace environment, then any time we don’t specify the full (absolute or relative path) to a dataset, arcpy will look for it in the location we set to this environment. Let’s try it.

Just above where you set the arcpy.env.overwriteOutput to True, set the workspace environment to our data folder:
```
arcpy.env.workspace = str(Path.cwd() / 'data')
```

Now we can just pass the filename, not the full path of the IBTracs_NA.shp file in the code cell:

arcpy.analysis.Select(
    in_features=str('IBtracs_NA.shp'),
    out_feature_class=r"memory\SelectedPoints",
    where_clause=f"SEASON = 2024 And NAME = 'HELENE'"
)

😎Recap - Datasets in Geoprocessing Tools

Geoprocessing tools access datasets via their file pathnames, and we have a variety of ways of providing those pathnames. Relative paths, and pathlib's Path module are useful in setting those relative paths - whether its to the file itself or to set arcpy ‘s workspace environment setting.

We also introduced arcpy.env as a way of getting and setting environment variables, which extends well beyond the workspace setting. See this resource for more information on that: Environment settings in Python

✅Tool Outputs in Scripts

You may have noticed that the products generated in geoprocessing tools, like inputs, are specified as file paths. We, of course, have been conveniently writing outputs to memory, but if we want to save the outputs to disk, we provide a path and filename for the file to be created. (Can you guess where the file would go if we only provided a filename, no path??).

Let’s try this with our code, writing the storm track to a file called “StormTrack.shp” in the data folder.

Replace the code in the Select code cell with:

arcpy.management.PointsToLine(
    Input_Features=r"memory\SelectedPoints",
    Output_Feature_Class=str(Path.cwd()/'data'/'StormTrack.shp'),
    Line_Field=None,
    Sort_Field="ISO_TIME",
    Close_Line="NO_CLOSE",
    Line_Construction_Method="CONTINUOUS",
    Attribute_Source="NONE",
    Transfer_Fields=None
)

Run the tool, and should see a new shapefile added to your data folder.

That works, but we can improve our code by setting the filename to a variable so that we can reuse the code.

Change the code such that we create a variable (storm_track) for the output feature class and use that in our tool:

storm_track = str(Path.cwd()/'data'/'StormTrack.shp')
  
arcpy.management.PointsToLine(
    Input_Features=r"memory\SelectedPoints",
    Output_Feature_Class=storm_track,
    Line_Field=None,
    Sort_Field="ISO_TIME",
    Close_Line="NO_CLOSE",
    Line_Construction_Method="CONTINUOUS",
    Attribute_Source="NONE",
    Transfer_Fields=None
)

Now we have the storm_track variable that points to the storm_track feature class, facilitating use in subsequent geoprocessing tools.

Alternatively, we can save the “result object” of a tool. See this section for greater detail on working with this, but here’s a code on how it works:

Change the code as follows:

result = arcpy.management.PointsToLine(
    Input_Features=r"memory\SelectedPoints",
    Output_Feature_Class="memory/storm_track",
    Line_Field=None,
    Sort_Field="ISO_TIME",
    Close_Line="NO_CLOSE",
    Line_Construction_Method="CONTINUOUS",
    Attribute_Source="NONE",
    Transfer_Fields=None
)

Now we have a result object, from which we can get the output feature class.

Add a new code cell to store the output feature class as a variable.
```
storm_track = result.getOutput(0)
```

😎Recap - Tool Outputs

The outputs of geoprocessing tools are often new datasets that are created at the path+filename provided as one of the tool’s parameter. As we are likely to want to reference this output later in our script, it’s useful to store this path+filename as variable, which we can do explicitly before running the tool, or after the fact, by saving the result of the tool and extracting it from that object via the result object’s .getOutput(0) function.

✅More on the parameters associated with ArcPy geoprocessing tools

It’s a bit cumbersome to have to run a tool in ArcGIS Pro to expose its Python equivalent. Fortunately, between ArcGIS Pro’s documentation and VS Code’s intellitype functionality – and yes, AI chatbots and code assistants too – finding out how to correctly code a geoprocessing tool is not too challenging.

In our notebook, we next want to move onto the section where we extract the county features that intersect the storm track created. There’s a slight wrinkle in doing this in that our counties feature class is a web service layer, but this just means we have to run the Make Feature Layer tool before we can run the Select Layer By Location. And we’ll use this as an opportunity to learn and apply new geoprocessing tools in our notebook, starting with Make Feature Layer.

First, look up the syntax of the Make Feature Layer tool in the ArcGIS Pro documentation. A few ways to do this are:
- Do a web search for the tool help page, e.g. search for “ArcGIS Pro Make Feature Layer”
- Open the help for the tool in ArcGIS Pro
- Browse the ArcGIS Pro help for the tool.
Either way, this should bring you to the following link. (Be sure the help page matches the version of ArcGIS Pro you are using.)
Scroll down to the Parameters section of the page and click on the Python tab to reveal the Python syntax for the tool. Here you see detailed explanations of each parameter, required and optional for the tool as well as some scripting examples.
Add a new code cell to your Jupyter notebook in VS code.
- In this code cell, start typing the geoprocessing function, notice that intellitype starts helping you. And after you type the open parens (“(”) the tool syntax appears.
- Alternatively, you can copy the full tool syntax from the ArcPy help page and paste it into your code cell. I like to enter each parameter on a new line, and I can comment out optional commands I don’t want to specify:
```
arcpy.management.MakeFeatureLayer(
    in_features=, 
    out_layer, 
    #where_clause=, 
    #workspace=, 
    #field_info=
)
```
- Then populate the items:
```
in_features='https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0'
out_layer='county_features'
```
- And finally, save the output to a variable. The final code cell should be:
```
counties = arcpy.MakeFeatureLayer_management(
    in_features='https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0',
    out_layer='County_features'
).getOutput(0)
```
And now we have a variable “counties_lyr” that we can use as an input to the next process.
Try on your own to add a code cell that applies the Select Layer By Location to select features in the county layer that intersect the storm track features. Save the result as a variable named “selected_counties”
Try it again with the Copy Features tool, saving the output as “Affected_Counties.shp” in the data folder.

You’ve just replicated your Storm Tracking geoprocessing workflow in Python! You may well be thinking: that was a lot easier in ArcGIS Pro. Why all the effort in Python then? Well, in the next exercise, we’ll spice up our Notebook and explore some of the other capabilities of ArcPy as well as of what coding can do for us.

Revisiting the Hurricane Mapping Tool - With ArcPy