Exercise 4: Performing Analysis in Python
Overview
The AGOL interface for doing analysis is fairly straight-forward, but it’s limited. We can actually do much more web-based GIS using the AGOL platform via a Python package. In this set of exercises, we examine how Python, run in Jupyter notebooks, enables us to perform web based geospatial analyses similar to what we can do via the AGOL web interface. However, by accessing these operations via a coding platform, we are free from the restrictions of what the AGOL interface places and can actually do much, much more.
Tasks
- Create a new Jupyter notebook in AGOL
- Basic notebook operations
- Tidying and uploading data to a hosted feature layer
- Executing Analysis
👉 If you are new to Jupyter notebooks (and Python), consider running through this tutorial. It covers the basics on how to create a notebook, working with notebooks, and some nifty things you do with them.
1. Creating a new Jupyter notebook
-
Log in to ArcGIS Online using your class account.
-
Click on the Notebook tab.
-
Click the button to create a new notebook.
Note you have three options here, with two requiring credits. We won’t be doing any heavy lifting in this exercise, so we can select the Standard notebook option.
A new notebook appears with a few text and code cells already populated. Next, we’ll take a quick look at a few of the basic operations in a Jupyter notebook.
-
Save your notebook. Your notebook is now stored within your AGOL account.
You can resume work on it later by opening it. You can also share your notebook with others.
2. Basic notebook operations
-
Click the first code cell, the one with the code “
from arcgis.gis import GIS
”, and click the ►Run button.
The first of these two code statements import theGIS
package - part of the ArcGIS Python API - into your active coding session. The second creates a secure connection, via the “gis” object, to your coding session to your AGOL account. -
Click the second [empty] code cell to activate it. Then expand the Add menu. Search your content for your Salamander points feature layer (possibly called
Devitt_et_all_2019_<netID>
), and click the (+) to add it to your coding session.
A new code cell is added to your notebook, populated with the code to add this resource to your coding session. -
Run this new code cell.
The Salamander points are now accessible by your code. The variableitem
is the coding analog to the feature service layer’s “Item Page”, a link to which is provided below.Items are generic AGOL resources. They can point to anything that AGOL can store, i.e., anything listed in the Contents tab: a feature service, a map service, a CSV file, etc. These items themselves don’t contain any data, rather they point to data. We need to dig a bit deeper to get access to the actual data objects…
- In the empty code cell below (click the plus
icon to add a new code cell if one does not exist), type the following and run it:
the_layers = item.layers the_layers
This creates a new variable (
the_layers
), assigning it the results of theitem.layers
statement. The value is a Python list of the layers associated with this feature layer item (Python lists are enclosed by square brackets:[]
.) This list contains only one layer, which we will extract into its own variable. -
In a new code cell, add the following command and run it:
the_layer = the_layers[0] the_layer.properties.geometryType
This code extracts the first (and only) layer associated with the layers, and reports one of its properties.
→ Try replacing
geometryType
withtype
to reveal a different property.→ Try removing everything after the
the_layer.properties.
and then hit thekey. This reveals all the properties you can inspect. -
In a new code cell, add the following command and run it:
my_map = gis.map( location='San Antonio, TX', zoomlevel = 7 ) my_map.add_layer(the_layer) my_map
You should see a map, centered on San Antonio, Texas showing our feature layer. Click on a feature and you will see its attributes.
- Save your notebook to your Salamanders folder.
We’ve just done the very basic steps of creating an analog to an AGOL web map. We don’t have the nifty graphic interface to change symbology, turn layers on and off, etc.; instead we must do that with lines of code. You wouldn’t be the first to question whether this is any easier than using the web interface, but coding does offer several advantages, which we will explore as we continue this exercises.
3. Tidying and uploading data to a hosted feature layer
We’ve already examined how easy it is to create a hosted feature layer on AGOL by uploading a CSV file and specifying the fields used to construct geometries, whether that be coordinates or placenames/addresses we can use to geocode locations. Once the data are hosted, however, AGOL doesn’t have much capability for cleaning up or wrangling data. For example, our original Salamander point dataset had a number of duplicated records that I had to tidy up before passing along to you.
In this next task, we explore Python’s ability to read in tabular data (CSV, tab-delimited, or Excel files) into a Pandas dataframe, whereby we can unleash an extensive Python toolkit for exploring, wrangling, and visualizing the data. Then, because our data has a spatial component, we “upgrade” our dataframe to a “spatially enabled dataframe” (SeDF), which further enables geospatial exploration, wrangling, and visualization to our dataset. This latter step is done via the ArcGIS API for Python - a Python package (i.e. add-on) that pulls AGOL’s geospatial capabilities into a Python coding environment.
In all, the combination of Panda's ability to work with dataframes and the ArcGIS API's ability to link AGOL's ability with these dataframes is a powerful force for doing web-based GIS. It does have a bit of a steep learning curve, but taken one step at a time, we can better understand all we can do with this dynamic duo. So, starting simply, we’ll explore the procedure that includes the following:
- Setting up our coding environment & importing the necessary Python packages
- Reading our Excel file contents into a Pandas Dataframe and tidying the data.
- Converting out Pandas dataframe into a spatially enabled dataframe
- Visualization and export of our data to AGOL
Here, we’ll be using ArcGIS Pro and will be saving files locally, not in the cloud. If you wish, you can use your ENV790 class drive to host these files. To do so, you can download a drive mapping script here.
3.1. Setting up our coding environment & importing the necessary Python packages
To start, we’ll create a new ArcGIS Pro Project, create a new Jupyter notebook in this project workspace, and add code to this notebook that imports the Python packages we need as well as link our notebook to our class AGOL account.
-
On your desktop, create a new ArcGIS Pro project.
-
Copy the “raw” Salamander Excel file (found here) into your project workspace.
-
Create a new Jupyter notebook within ArcGIS Pro:
- From the ArcGIS Pro Analysis menu, select Python Notebook from the Python dropdown menu.
-
Add the following code in the first code cell, update the username to your own (using your class login) and run:
from arcgis import GIS import pandas gis = GIS( url='https://dukeuniv.maps.arcgis.com', username = 'netID_env790' )
It will ask you for your AGOL account password, which you should enter at the prompt.
►This code snippet does the following:
- It imports the Pandas library. Pandas is used for working with dataframes…
- It imports the GIS module of the ArcGIS API for Python (“arcgis”). The GIS module enables interaction with ArcGIS Online via Python commands. Extensive documentation for the GIS module is found here.
- And finally, it establishes a connection to our AGOL account, stored as a variable we’ve named
gis
.
-
In a new code cell, enter and run:
gis.users.me
This will display your account info, by calling up the
users
sub-module of thegis
object.►The
gis
objects is organized into a number of sub-modules. The two you are most likely to use are:
3.2. Reading our Excel file contents into a Pandas Dataframe and tidying the data
-
Now we’ll load our Excel file into our coding environment as a Pandas dataframe.
Enter and run the following in a new code cell:df_raw = pandas.read_excel( io = 'Devitt_et_al_2019_raw.xlsx', index_col='FID' ) df_raw.head()
►Here, the first line initiates the ‘read_excel()’ command from the
Pandas
library, with the two subsequent lines (which fall between the parentheses and are indented), reflecting the parameters for this command.-
io=...
specifies the Excel file we are opening (which lives in the same folder as our notebook) -
index_col='FID'
is an optional parameter that sets the “FID” as the row index for our dataframe
When run, we create a new object we named
df_raw
to reference our newly created dataframe, constructed from the contents of the Excel file. Thedf_raw.head()
command displays the first 5 records of our dataframe.Our data is now accessible as a Pandas dataframe. Pandas is amazingly adept at all kinds of data analytic processes that we won’t go into here. Instead we’ll just use one operation which is to remove duplicates in our data.
-
-
Apply the
drop_duplicates()
command to clean our dataset;
Type and run the following code in a new code cell:df_processed = df_raw.drop_duplicates() print(df_raw.shape, df_processed.shape)
►The first command removes all duplicate rows from our dataframe, saving the output as the variable
df_processed
. The subsequent command prints the shape (rows and columns) of the two dataframes, revealing that we went from 367 to 109 records, i.e., that 258 records were exact duplicates.
3.3. Convert our Pandas dataframe to an ArcGIS Spatially Enabled Dataframe
-
Next, we convert our tidied dataframe into a spatially enabled dataframe (aka “SeDF” or “spatial dataframe”).
Add the following to a new code cell and execute:sdf_salamanders = df_processed.spatial.from_xy( df = df_raw, x_column='X', y_column='Y', sr = 4326 ) sdf_salamanders.head()
►Here, the
from_xy()
function is part of the arcgis module. The parameters we must provide include the dataframe containing the XY coordinate fields, what those fields are (i.e. their column names), and the spatial reference the coordinates use.The are two exceptionally unintuitive concepts introduced here:
-
First is that the
from_xy()
function is not applied to the Pandas dataframe directly. Instead, thespatial
term, when appended to the name of the dataframe object (i.e.df_processed.spatial
) causes Python to treat the object not as a Pandas dataframe, but as an ArcGIS API GeoAccessor. To simplify, adding.spatial
changes our dataframe’s functionality such that we can do things like convert it to a SeDF via thefrom_xy()
function. -
Second is how the spatial reference is defined. Here the
sr = 4326
sets the spatial reference to the WGS 1984 geographic coordinate system. Every coordinate reference system has a numeric code, which you can find via https://epsg.io or https://spatialreference.org.
When executed, the
from_xy()
command produces a new object which we reference with the variable namesdf_salamanders
. This object behaves both as a Pandas dataframe, if we omit the.spatial
or as a SeDF, if we add the.spatial
. The last command in the snippet reveals that we now have a new column associated with the dataframe: the geometry. -
3.4. Visualization and export of our data to AGOL
-
Let’s do a quick visualization of our SeDF with the following command:
sdf_salamanders.plot()
Alternatively, you could map it as we did before:
my_map = gis.map('San Antonio, TX', zoomlevel=8) my_map.add_layer(sdf_salamanders) my_map
-
To save our SeDF to AGOL as a hosted feature layer, we use the
to_featurelayer()
function. First, let’s glance at the structure of this function. In a new code cell type and run the followingsdf_salamanders.spatial.to_featurelayer?
The
?
at the end will display info on the function instead of running it. We see what information we need to supply. Modify your code cell so it reads (modifying the NetID in the title:feat_lyr = sdf_salamanders.spatial.to_featurelayer( title='Salamander_Pts_II_jpfay', gis = gis, tags = ('Salamanders','ENV790'), folder = 'Salamanders' ) feat_lyr
If successful, your code will display the item block for the feature service layer you just created.
4. Executing Analyses in AGOL via Python
4.1 Analysis: Spatial Join
In this exercise, we’ll mimic the spatial join we did using the AGOL interface, but using code to do it. We’ll build off the concepts we learned above regarding importing datasets, and then we’ll explore the ArcGIS Python API’s analytical tools.
The general workflow is as follows. After creating a new Jupyter notebooks, we’ll add the two datasets we want to spatially join to our coding environment. This will involve first locating and importing the dataset item, and then extracting the feature layer from each respective item. (Our operations work on feature layers, not items…). We’ll detour a bit and set the spatial extent of our analysis, choosing the “Lost Maples State Park, Vanderpool, TX” region to define this extent, as we did in the AGOL exercise. And finally, we’ll find and execute the ArcGIS Python API function that will perform the spatial join, saving our output as a new feature layer in our AGOL account.
4.1.1 Analysis set up
-
Create a new notebook in AGOL.
- Code should be automatically added to import the GIS module of the ArcGIS package and to connect your notebook to your AGOL account (here called “home”).
-
Add your Salamander Points item to your coding session.
- Rename your Python variable something different than
item
; this will be helpful later…
- Rename your Python variable something different than
-
Extract the Salamander Points layer from the the item above
- Assign the one and only layer to a new variable, e.g.
salamander_lyr
- Assign the one and only layer to a new variable, e.g.
-
Search for and add the USA Map Soils Layer item to your coding session and extract its one and only layer to a new variable (e.g.
soils_lyr
)
4.1.2 Using the geocoding module to set the analysis extent
Recall that in the AGOL-based exercise we entered a location to zoom our map to. Behind the scenes, AGOL was executing a geocoding operation: looking up the place name we supplied in a geocoding locator and returning coordinates corresponding to that location. We do the same procedure here, invoking the ArcGIS API’s geocoding
module to add that functionality to our session.
-
In a new code cell add the following and run:
from arcgis.geocoding import geocode
►This code imports the
geocode()
command (part of thegeocoding
module). More examples using the geocoding module is available here, with a reference page for the module here.
❗ You should note that geocoding costs credits; not a lot, but they add up… -
In a new code cell (or in a line below the one above), enter and run the following:
results = geocode("Lost Maples State Park, Vanderpool, TX") lost_maples = results[0]
►This code searches the geolocator for features matching the text we entered, which returns a list sorted by confidence. We take the first item in this list and store it as
lost_maples
. This item is a Python dictionary, essentially a collection of properties associated with our result. One of the items is the extent of our geolocated area… -
Examine the extent of the Lost Maples area with the following code:
lost_maples_extent = lost_maples['extent'] lost_maples_extent
We now have the information needed to set the extent of our analysis.
4.1.3 Performing the analysis
-
From the Analysis tab, add the Join Features function to a new code cell.
- Note that this imports a new module to your script:
features
- This
features
module contains a submodule namedsummarize_data
, which has the functionjoin_features()
- You can add a
?
to the end of thejoin_features()
command to quickly display its syntax. Or you can search for it in the API documentation.
- Note that this imports a new module to your script:
-
Add parameters to the the
join_features()
function:soils_overlay = features.summarize_data.join_features( target_layer = salamander_lyr, join_layer = soils_lyr, spatial_relationship='intersects', output_name='Salamander_Soils_II_jpfay', context= {'extent': {'xmin': -99.59682999999994, 'ymin': 29.78149000000003, 'xmax': -99.54482999999995, 'ymax': 29.83349000000003, "spatialReference": {"wkid": 4326} }}, estimate = True )
Be sure you understand what the inputs here represent. The one that should be discussed is the
context
input, in particular how I got these values…
4.1.4 Visualize the outputs
-
Create a map zoomed to our “Lost Maples State Park”. Set the basemap to satellite imagery, add our new layer, and view the map with the following code:
the_map = gis.map( location="Lost Maples State Park, Vanderpool, TX", mode='3D' ) the_map.basemap='satellite' the_map.add_layer(soils_overlay) the_map
-
Click on a point in your map to ensure the map unit data has been added to your feature.
4.2 Raster Sampling
[Still under construction…]
-
Create a new notebook in AGOL with the pre-configured login code.
-
Import the Salamander points and extract its layer to a variable named
salamander_lyr
. -
Import the USA NLCD Land Cover image service item and extract its layer to a variable named
nlcd_lyr
. -
From the Raster Analysis Tools menu, select
Sample
to add that tool to your notebook.- First, note that this tool also imports additional modules to your coding session.
- Then, to quickly view the syntax of this tool, replace the
()
at the end with?
and run it.
-
Revert the
?
back to()
and add a new line between the parentheses. Start adding the parameters to run the tool (be sure to modify to fit your naming conventions and folder organization…):from arcgis import raster raster.analytics.sample( input_rasters = [nlcd_lyr], input_location_data=salamander_lyr, unique_id_field = 'FID', statistics_type = 'MEAN', output_name='sample_output_jpfay', folder='Salamanders' )
Run the tool. Did it work? You likely received the error that you exceeded the allowed number of rows and columns. We need to set the extent of the analysis. We can try setting to the extent of the Salamander point dataset…
-
Add a code cell above the “sample” code cell and add the following code:
from arcgis import env env.analysis_extent = salamander_lyr.properties['extent']
-
Run the two code cells, first to set the analysis extent, and then to run the sample tool. Does it work now?
4.3 Watersheds
See if you can run the code to create a feature layer of watersheds for each Salamander point.
4.4 Creating a forest raster from the NLCD
This one is a bit trickier as we need to dig deeper into Imagery datasets.