ArcGIS Insights

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Introduction

ArcGIS Insights is a nifty web-based data wrangling visualization tool that facilitates data exploration across multiple datasets. Not only is it a powerful tool for making some powerful interactive maps plots and graphs, but it’s also a good entry into thinking about data from a data science perspective. And that is what will be the focus of this tutorial: getting to know data from a data perspective, while also making some compelling visuals and learning a useful tool that doesn’t require any additional software to be installed!


Topic Learning Objective
Background ♦ Explain what Insights is and what it can do
♦ Describe the components of an Insights workspace:
   {Workbooks, Models, Datasets, Connections, Reports, Themes}
Getting Started ♦ Start a new Insights session
♦ Create a new Insights workbook
♦ Add data to your workbook
♦ Rename tables; renaming & hiding fields
Exploring the data ♦ Adding data stored in AGOL
♦ Enabling location
♦ View the data in a table form
♦ View the data in a chart

Background

The context for our analysis is to explore the relationship between open space and social vulnerability. We will add two datasets to an Insights workbook and then construct a few interactive graphs and plots that may or may not be revealing. Finally, we’ll add these to a StoryMap and share our plots with others.

Datasets

Final product


Getting Started

Open Insights and Create a new workbook

Step one is to create a new Insights workbook, which is where we add and explore our data.

  • Sign in to Insights in ArcGIS Online: https://insights.arcgis.com (more info)
  • Create a new Workbook

    • Click the Workbook item on the left menu area.

    • Click the New Workbook button in the upper right.

Add the Open Space data

Now we’ll add the county level open space data to our workbook.

  • Select Duke University (under the Connections section)

  • Switch searching My content to My Organization

  • Search for "Open Space" type:feature owner:kmk85_dukeuniv

  • View the details for Open_Space_ES_Priorities_Block_Groups

  • Click the check mark associated with the service to view its datasets

  • Click Add to add the one and only layer to your workbook

    :point_right: When the dataset is added, it will be listed and often a “card” will be added to your workbook displaying the data in some form. Because the data we just added is a spatial dataset, the card displays the data as a map. You can close this card for now, to avoid constant drawing of the features.

Save your workbook

  • Rename your workbook as “OpenSpaceAnalysis_Fall2024_<netID>” (replacing <netID> with your Duke NetID) by typing over its current name of “Untitled Workbook” by the Insights icon in the upper left.

  • Save the workbook by clicking the :floppy_disk: icon.


Exploring the data

With the dataset added, we’ll see in what ways we can explore and wrangle our dataset.

1. Get to know your data

Before working with any dataset, you should get to know it so that you don’t misuse it. This is not the focus of this particular tutorial, but understand who created the dataset and for what purpose. What assumptions were made in collecting it? Could there be any biases in their collection? Could these biases affect your conclusions?

On a more operational level, you’ll need to know what fields are included in your dataset and what they represent. Oftentimes, fields are renamed something cryptic to enable them to work in the software, but properly created datasets will have metadata that describe what these names are and what they actually mean. The metadata for this dataset is located here.

→ View the metadata descriptions for the fields in the OpenSpace dataset and understand to what they refer. We’ll focus our analysis on the overall Benefits Score (the BenScore field).

2. Tidy the Open Space dataset

Let’s first tidy up our dataset to suit our analysis.

  • Rename the table Open Space Benefit - Block Group

  • Select and hide all the fields except: GEOID & BenScore

  • Rename the BenScore field as Benefit Score

    This Benefit Score field is numeric, as detected when we imported the dataset (as indicated by the Σ next to the field). However, the value reflects a rate or ratio (as it has a defined maximum of 1). So we need to update the data type for this field so that its behavior in Insights will change accordingly.

  • Click the data type icon next to the Benefit Score field and change it to Rate/Ratio.

3. Compute new fields in your dataset

Next, we’ll want isolate records for a particular state and also summarize data by county. The first 5 digits of the GEOID indicates the county FIPS score, and the first 2 represent the state. Let’s create two new fields for each.

  • View your data in Tabular Format:
    • Select the Dataset options button (again, the three dots in the upper right corner), and select View Data Table.’
  • Add a new field:
    • For the formula, use the LEFT() function to extract the left 2 characters from the GEOID column.
    • Rename the field to ST_FIPS, then Run the function to create the new field.
  • Add another field, setting to to extract the first 5 characters of the GEOID field, naming the output FIPS.
  • Close the table view.
  • Save your workbook.

4. Explore your data via “cards”

  • Drag the ST_FIPS field into your workspace and drop onto the Table icon that appears. and then onto Summary Table. This creates a new “card” showing a summary table of the ST_FIPS values.

    • The default summary field is Count but you can change this to another summary statistic or to another field. Switch to the Open Space Benefit Score field and select Sum.
  • Change your reference table to a bar chart by selecting the little bar plot icon at the top of the card.

    Alternatively you can select the Benefits Score and the ST_FIPS variables, then drag your table into your page and select the Chart > Column Chart option. The default is to display the sum benefit score by state.

    Which state has the highest cumulative Open Space Benefit Score? What does this mean?

  • In the Y axis of your chart, switch the value being shown from SUM to AVG.

    Does this change how the data are interpreted? Why?

    What if we choose MEDIAN instead of AVG?

  • Click the up/down arrows at the top of the card to sort your data in either ascending or descending order.

    Is it helpful to sort the data? Why or why not?

  • Click the zig-zag line with an arrow at the top of the card and add upper and lower quartiles to your plot.

5. Explore your data via maps

  • Click the Benefits score field and drag it into your page, selecting the Map as the type of card.

  • Use the filter icon on the top of the card to filter records for just NC block groups (“ST_FIPS = 37”) .

    • Click the :house: icon to zoom to the extent of the filtered records (i.e., North Carolina).

    • Note that this creates a new virtual table in your workbook. (Virtual tables are shown in orange.)

  • Adjust the symbology via the layer’s Layer Options tab in your map card.

    • Alter the classification from Natural Breaks to Quantile
    • Change the color scheme of your map and set the outline thickness to 0.
  • Change the basemap used to Dark Grey Canvas via the basemap icon in the upper left region of the workbook.

    Do you see any spatial patterns in your map? Clusters of regions with high benefit scores?

  • Add the legend to your page…

  • Rename the table view created when you filtered your data to: “NC Open Space Benefits

  • Also, rename the map card created to “NC Open Space Benefits - Block Groups


Analysis: Spatial data aggregation

Insights as some analytical capability as well. We’ll explore how that’s done here, with an example that spatially aggregates our tract level data to features in a second dataset that we’ll bring in. That second datasets will be US Counties, which is a bit silly since our data can be aggregated via the county FIPS attribute, but it still gets the point across.

1. Find and add the county boundaries dataset

  • Add a US Counties layer to your workspace
    • Click the Add Data (+) icon to open the Add Data dialog box
    • Select Duke University under the Connections section on the left
    • Change My Content to in the dropdown list to Boundaries
    • Search on "USA Counties" (include the quotes)
    • Add the “USA Counties Generalized Boundaries” dataset to your workbook
    • Close the map card that appears when your dataset is added
  • Filter for just NC Counties
    • Expand the attribute list for the Counties dataset in the contents area
    • Click the funnel icon to the left of the State Name field
    • Filter for just North Carolina counties
  • Rename the dataset to “NC Counties”

  • Save your workbook

2. Spatially aggregate the tract data by this new feature

  • Drag the NC Counties dataset on top of NC Benefits map
    • Select Spatial aggregation
    • Area layer: US Counties
    • Layer to summarize: NC Open Space Benefits
    • Style by: AVG Benefit Score
    • Run the tool
  • Save the data by selecting Copy to Workbook
    • Rename the table “Avg Benefit by NC County”
    • Set the data types of the Avg Benefit Score and People per square mile to Rate/Ratio
  • Examine the output
    • Display county level data as choropleth map, classified by quantiles
    • View the associated data table

Creating Interactive Visualizations

  • Create a new analysis page

    • Drag the Avg Benefit by NC County on top of the :heavy_plus_sign: icon next to “Page 1” at the bottom of the screen. This creates a new page in your workbook.
      • Also: rename Page 1 to “NC General Exploration”
    • Rename the page: “Interactive plot of benefit score”

    • Open that page.
  • Create a new map card

    • Drag the Avg Benefits Score field into the page to create a map card. Change the classification from Natural Breaks to quantile w/5 breaks
    • Extract the legend onto the page.
  • Create and style a new scatterplot

    • From the Avg benefit by NC County dataset, select both People per square mile and Avg Benefit Score and drag into the workspace. Select Scatterplot.
    • If the People per square mile is not on the X axis, swap axes via the button under the x axis.

    • Drag the Name field on top of the scatterplot. As this is a categorical field, the points will be colored by county.

    • Drag the 2020 Total Population field on top of the scatterplot. As this is a continuous field, it will size the points by population.
  • Interact with the the two plots
    • Select points/features on one plot and note the selection updates on the other.
  • Save your workbook

An Example Analytical Workflow

In this example, we explore relationships between the open space benefit scores and social vulnerability. To do this, we need to add social vulnerability data (we’ll use the CDC’s Social Vulnerability Index - SVI - dataset) to our workbook, and then link these data with our open space benefit dataset, and then we can explore these data using various techniques that Insights allows.

1. Create a new page & add data

  • Create a new page
    • Click the :heavy_plus_sign: icon to create a new page
    • Rename the page “Open Space vs SVI”
  • Add data to the new page
    • Navigate back to the “NC General Exploration” page
    • Click and drag the “Open Space Benefits - Block Groups” data on top of the “Open Space vs SVI” page tab.
    • Filter this table for just North Carolina
  • Add the 2018 SVI data from ESRI’s Living Atlas

    • In the Open Data dialog, select Living Atlas as the source to search

    • Search for “CDC Social Vulnerability Index 2018 - USA

    • Add just the “Socioeconomic Theme - Counties” layer to your workbook
      • keep the map card that is created.
    • Filter the data for North Carolina
  • Drag the Open Space Benefits dataset on top of the SVI map and add it as a second layer

Our benefits data are at a finer spatial scale than our SVI data, so to reconcile the two datasets, we’ll aggregate our benefits data to the spatial scale of the SVI data, just as we did previously in this exercise through spatial aggregation.

Note this is not the only way to link two datasets. As you well know from previous GIS lessons, you can do an attribute join, if your data share a common attribute value (which ours do). Or you can perform a spatial join, if your features line up (which ours do). However for this lesson, we’ll stick with the spatial aggregation as this method will work nicely with most dataset pairs.

  • Click the “Action Button” in the lower left to reveal the various spatial analyses you can do with the layers in your map card.
  • Select “Spatial Aggregation”, set the Area layer to be the SVI data, the layer to summarize is the Benefits layer, and compute the average benefit score.
    • This will create a new data view with the average benefit score attribute added among the list of all of our SVI attributes.
  • Copy the virtual table to your workbook (to make permanent) and rename it as “SVI and Open Space”

4. Create interactive plots of your dataset

Now the fun part: enabling visual interaction of our combined data. We won’t go too deep in this example, but enough to give you an idea of what you can do with Insights.

  • Create a map of counties

    • If you don’t have a map card of the data, create one by dragging the dataset into the workspace and selecting Map.

    • Symbolize your map so that it shows counties as different colors (or another attribute, if you wish).

  • Create a scatterplot of variables, grouped by county

    • In the contents area, select the following three attributes (ctrl-click allows you to select multiple items): Avg Benefits Score, and Percent of persons below the poverty estimate.

    • Drag these three into the workspace and select Scatterplot. Swap axes if needed to get the Avg Benefits Score on the y-axis.
    • Drag COUNTY on top of the scatterplot to color points by county.
    • Drag a continuous variable onto the scatterplot to size the points by that value.

    • Note that selection of some of the points in your scatterplot highlight the features (counties) associated with those points in the map view!
  • Create a histogram of a continuous variable

    • Select a continuous variable and drag it into your workspace, selecting to view it as a histogram

    • Note that this too allows for interactive selection of features in both your scatterplot and your map
    • Click the button on top of the scatterplot to enable spatial filters

    • Consider the possibilities for interactive visualization of your data with different and/or additional plots!

Sharing your work

Since Insights is all done using on-line resources, it’s relatively easy to share your work.

  • In the upper right corner of the page you want to publish, select the Publish button
  • In the Publish dialog, select Report as the type, add tags, a description and specify with whom you want to share the item.
  • You are then presented with some links.
    • The first link - “View your report” - is a link you can share with others who can view your dashboard, but not edit it.
    • The second link - “Access your report it” - links to the item page for your dashboard, allowing others to view or edit it, based on their permissions
    • The third link - “Embed” - allows you to insert your dashboard into another web page, e.g. a StoryMap

What’s Next

We’ve only touched on the very basics of creating an Insight’s dashboard. There are numerous other datasets, analyses, and visualizations to play with. But now you have the basics to help you understand this tool’s abilities and some basic knowledge how to get started.