Geospatial Data Analytics - Course Synopsis

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Course theme

For many of you, Geospatial Data Analytics may be the last formal GIS class you take. Ever. Ideally then, by the end of this class, you’d have learned everything there is to know about GIS, but of course that won’t be the case. There’s just too much to know and the technology changes too quickly.

So instead, this course aims to prepare you for “life beyond the classroom”. In other words, I want you to leave this class (and NSOE) with enough know-how and confidence to confront any kind of geospatial challenge and make steadfast progress toward meeting that challenge.

To get there, we’ll cover a set of topics related to GIS, starting with some familiar geoprocessing in ArcGIS Pro and progressing into topics that may be completely new to you. Often, we’ll just barely introduce these topics before moving on to the next, but this is by design. These quick introductions will expose you to a new facet of GIS and give you enough of a footing in the topic to continue learning more on your own, if desired. In the end, you should discover that many complicated technologies that may seem completely beyond your grasp, often just require a bit of guided curiosity, patience, and determination to learn and use.

I also, however, want to instill in you the notion of geospatial analysis as a branch of the broader data analytics “revolution”. Over the past several years, the explosive growth of new, large datasets and the computing power to handle these datasets are expanding the very types of questions we can explore - hence the data analytics revolution. Through the topics we cover, I will emphasize how each has its role in broader data analytics, and in the end recap how the skills you’ve learned give you greater command to forge raw data into actionable intelligence and informed decisions.


Topic 1: Data Engineering

In the realm of geospatial data analytics, data engineering plays a pivotal role in unlocking the full potential of spatial information. It serves as the cornerstone of the analytical process, enabling students to transform raw geospatial data into a refined, usable, and coherent format. By mastering data engineering techniques, you will gain the skills to integrate disparate data sources, cleanse and validate information, and harmonize varying spatial formats. This ensures the accuracy, reliability, and consistency of the data, laying a solid foundation for accurate geospatial analysis. Furthermore, proficient data engineering empowers students to optimize data storage and retrieval, enhancing the efficiency of their analytical endeavors. In the classroom, emphasizing the significance of data engineering equips aspiring geospatial analysts with the essential tools to confidently navigate complex real-world spatial challenges and derive meaningful insights from the rich tapestry of geospatial data.

Topic 2: Python 101

While ArcMap’s graphical user interface, or GUI, offers a far more friendly foothold to learning and exploring GIS than, say, a blinking cursor, you may eventually find that clicking menus and dragging boxes around may not be the fastest way to get something done. (Imagine building a geoprocessing model that mosaics 348 separate rasters…). Writing scripts is a way around the limitations of the GUI, and Python is scripting language of choice for writing lines of text that can tell ArcGIS to run various tools. Python also has gobs and gobs of other uses, and is quickly taking over as one of the most widely used scripting languages out there.

In this section, we introduce the Python scripting language. We start at ground zero, covering the basic concepts: data types, language structure, reading and writing files, iterating through lists, conditional statements. These basic concepts, however, go a long way in getting you to a point where you can write powerful scripts, and learning these basics, we also learn how best to continue learning: where to discover more, how to dig deeper into these discoveries, and how to seek help when problems arise.

Directly related to learning Python, we’ll also examine some essential and some useful scripting tools. We’ll write some scripts as IPython (or Jupyter) notebooks and others using the PythonWin IDE, two different, but not entirely competing means for writing and running scripts. We’ll also learn a bit of GitHub, a popular cloud based tool for writing, hosting, and collaborating on scripting projects.

Topic 3: Using Python to do GIS

With a bit of Python knowledge under our belt, we return to GIS and explore ESRI’s ArcPy package, a Python library that gives us access to everything ArcGIS has to offer – and more – from Python’s scripting environment. We learn how to use Python to run geoprocessing tools, create and manipulate geometric features, execute raster algebra statements, query and update attributes, and develop entire script-based tools that can be run from ArcGIS Pro itself. We also see how Python’s vast array of open source packages can be coupled with ArcPy to take our analyses in innumerable new directions.

Topic 4: GIS in the context of Data Science: “Spatial Data Science”

Data Science is a fast emerging and “sexy” topic these days, but what is it? Here we’ll discuss what’s behind the big movement and the role GIS plays in it. We’ll examine the data science workflow of data engineering, data visualization/exploration, analysis/modeling/scripting, and sharing/collaboration. We’ll also discuss the importance of reproducibility and transparency. Additionally, we’ll examine key data structures used in data science, specifically the dataframe and its spatial counterpart, the spatially enabled dataframe, learning how these are constructed and from various data sources and used in analyses. It’s here where we’ll take a deep dive into ESRI’s ArcGIS API for Python, a powerful new package that links GIS, data science, and our next topic - cloud based GIS.

Time permitting, we’ll also examine the non-ESRI, open-source alternatives to include spatial analysis in data science tasks. These include GDAL, GeoPandas, Shapely, Fiona, OSM, and Folium. We may also explore technologies such as machine learning, artificial intelligence, and image processing from a spatial analysis perspective. We could also examine spatial analysis tools supported in R.

Topic 5: Cloud-based GIS

The paradigm of computing is changing. Rather than downloading datasets to our local machine, we are accessing remote data services. And rather than crunching analyses on our local CPU, we are tapping into remote processing services. In this section, we explore the concept of client-server architecture as it applies to technologies such as ArcGIS Online. We reveal that the ArcGIS Python API is actually a wrapper for something that is far more powerful and likely to be the dominant platform for geospatial analysis in the not-so-distant future.

More specifically, we’ll design, execute, and share spatial analysis workflows using ArcGIS Online. We’ll examine other useful on-line tools ESRI provides: Story Maps, Dashboards, and Insights. Then we’ll peek “behind the curtain” of these technologies, into the application programming interfaces, or APIs, that drive them and how we can control these APIs using Python to do things like automate data download, perform spatial analysis, and develop analytical dashboards.

Time Permitting: Google Earth Engine - or - GIS in R & R-Studio

If any time remains (or if this emerges as a priority among student needs), we will examine one of two topics:

  • Google Earth Engine offers petabytes of spatial data and a speedy analysis platform – all for free! We will take a quick tour of what it can do, both using its on-line analysis interface and via its Python plugin. It will be just enough to get you started doing some amazing analyses.
  • GIS in R & R-Studio: The capacity for geospatial analysis in R and R-Studio is growing rapidly. Here we’ll dabble in the ‘sf’ and ‘rgdal’ packages. We may also [quickly] examine the significance of R-Markdown and R-Shiny in the scripting landscape.